Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund  wrote:
>> I wondered that, too, but it's not well-defined for all tuples.  What
>> happens if you pass in constructed tuple rather than an on-disk tuple?
>
> Those should be discernible I think, t_self/t_tableOid won't be set for
> generated tuples.

I went looking for existing precedent for code that does things like
this and found record_out, which does this:

HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0);
...
/* Extract type info from the tuple itself */
tupType = HeapTupleHeaderGetTypeId(rec);
tupTypmod = HeapTupleHeaderGetTypMod(rec);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
ncolumns = tupdesc->natts;

/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;

This appears to be a typical pattern, although interestingly I noticed
that row_to_json() doesn't bother setting t_tableOid or t_self, which
I think it's supposed to do.  The problem I see here is that this code
seems to imply that a function passed a record doesn't actually have
enough information to know what sort of a thing it's getting.  The use
of HeapTupleHeaderGetTypeId and HeapTupleHeaderGetTypMod implies that
it's safe to assume that t_choice will contain DatumTupleFields rather
than HeapTupleFields, which doesn't seem to bode well for your
approach.

Am I missing a trick?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XML Issue with DTDs

2013-12-20 Thread Florian Pflug
On Dec20, 2013, at 18:52 , Robert Haas  wrote:
> On Thu, Dec 19, 2013 at 6:40 PM, Florian Pflug  wrote:
>> Solving this seems a bit messy, unfortunately. First, I think we need to 
>> have some XMLOPTION value which is a superset of all the others - otherwise, 
>> dump & restore won't work reliably. That means either allowing DTDs if 
>> XMLOPTION is CONTENT, or inventing a third XMLOPTION, say ANY.
> 
> Or we can just decide that it was a bug that this was ever allowed,
> and if you upgrade to $FIXEDVERSION you'll need to sanitize your data.
> This is roughly what we did with encoding checks.

What exactly do you suggest we outlaw? If there are XML values which
are CONTENT but not a DOCUMENT, and other values which are a DOCUMENT
but not CONTENT, then what is pg_restore supposed to set XMLOPTION
to?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Josh Berkus
On 12/20/2013 04:44 PM, Gavin Flower wrote:
> On 21/12/13 13:40, Josh Berkus wrote:
>> On 12/20/2013 03:09 PM, Gavin Flower wrote:
>>> What about leap years?
>> What about them?
>>
> some years have 365 days others have 366, so how any days in an interval
> of 2 years?, 4 years?

Your question isn't relevant to this patch.  It's not defining the
interval type, just creating an alternate constructor for it.

(the answer is, it depends on what timestamp you're adding it to ...)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Gavin Flower

On 21/12/13 13:40, Josh Berkus wrote:

On 12/20/2013 03:09 PM, Gavin Flower wrote:

What about leap years?

What about them?

some years have 365 days others have 366, so how any days in an interval 
of 2 years?, 4 years?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Josh Berkus
On 12/20/2013 03:09 PM, Gavin Flower wrote:
> What about leap years?

What about them?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nested hstore patch

2013-12-20 Thread Andres Freund
On 2013-12-20 15:16:30 -0800, David E. Wheeler wrote:
> But for the hstore feature itself, I think the current interface and features 
> are ready to go.

I think this patch needs significant amount of work because it can be
considered ready for committer. I found the list of issues in
http://archives.postgresql.org/message-id/20131118163633.GE20305%40awork2.anarazel.de
within 10 minutes, indicating that it clearly cannot be ready yet.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] nested hstore patch

2013-12-20 Thread David E. Wheeler
On Nov 12, 2013, at 10:35 AM, Teodor Sigaev  wrote:

> Hi!
> 
> Attatched patch adds nesting feature, types (string, boll and numeric 
> values), arrays and scalar to hstore type.

My apologies for not getting to this sooner, work has been a bit nutty. The 
truth is that I reviewed this patch quite a bit a month back, mostly so I could 
write documentation, the results of which are included in this patch. And I'm 
super excited for what's to come in the next iteration, as I hear that Teodor 
and Andrew are hard at work adding jsonb as a binary-compatible JSON data type.

Meanwhile, for this version, a quick overview of what has changed since 9.2.

Contents & Purpose
==

Improved Data Type Support
--

* Added data type support for values. Previously they could only be strings or 
NULL, but with this patch they can also be numbers or booleans.

* Added array support. Values can be arrays of other values. The format for 
arrays is a bracketed, comma-delimited list.

* Added nesting support. hstore values can themselves be hstores. Nested 
hstores are wrapped in braces, but the root-level hstore is not (for 
compatibility with the format of previous versions of hstore).

* An hstore value is no longer required to be an hstore object. It can now be 
any scalar value.

These three items make the basic format feature-complete with JSON. Here's an 
example where the values are scalars:

=% SELECT 'foo'::hstore, '"hi \"bob\""'::hstore, '1.0'::hstore, 
'true'::hstore, NULL::hstore;
 hstore |hstore| hstore | hstore | hstore 
+--+++
 "foo"  | "hi \"bob\"" | 1.0| t  | 

And here are a couple of arrays with strings, numbers, booleans, and NULLs:

SELECT '[k,v]'::hstore, '[1.0, "hi there", false, null]'::hstore;
   hstore   |   hstore   
+
 ["k", "v"] | [1.0, "hi there", f, NULL]

Here's a complicated example formatted with `hstore.pretty_print` enabled.

=% SET hstore.pretty_print=true;
=% SELECT '{
  "type" => "Feature",
  "bbox" => [-180.0, -90.0, 180.0, 90.0],
  "geometry" => {
"type" => "Polygon",
"coordinates" => [[
  [-180.0, 10.0], [20.0, 90.0], [180.0, -5.0], [-30.0, -90.0]
  ]]
}
}'::hstore;
  hstore  
--
 "bbox"=>+
 [   +
 -180.0, +
 -90.0,  +
 180.0,  +
 90.0+
 ],  +
 "type"=>"Feature",  +
 "geometry"=>+
 {   +
 "type"=>"Polygon",  +
 "coordinates"=> +
 [   +
 [   +
 [   +
 -180.0, +
 10.0+
 ],  +
 [   +
 20.0,   +
 90.0+
 ],  +
 [   +
 180.0,  +
 -5.0+
 ],  +
 [   +
 -30.0,  +
 -90.0   +
 ]   +
 ]   +
 ]   +
 }

So, exact feature parity with the JSON data type.

* hstore.pretty_print is a new GUC, specifically to allow an HSTORE value to be 
pretty-printed. There is also a function to pretty-print, so we might be able 
to just do away with the GUC.

Interface
-

* New operators:
  + `hstore -> int`: Get string value at array index (starting at 0)
  + `hstore ^> text`:Get numeric value for key
  + `hstore ^> int`: Get numeric value at array index
  + `hstore ?> text`:Get boolean value for key
  + `hstore ?> int`: Get boolean value at array index
  + `hstore #> text[]`:  Get string value for key path
  + `hstore #^> text[]`: Get numeric value for key path
  + `hstore #?> text[]`: Get boolean value for key path
  + `hstore %> text`:Get hstore value for key
  + `hstore %> int`: Get hstore value at array index
  + `hstore #%> text[]`: Get hstore value for key path
  + `hstore ? int`:  Does hstore contain array index
  + `hstore #? text[]`:  Does hstore contain key path
  + `hstore - int`:  Delete index from left operand
  + `hstore #- text[]`:  Delete key path from left operand

* New functions:
  + `hstore(text)`: Make a text scalar hstore
  + `hstore(numeric)`:  Make a numeric scalar hstore
  + `hstore(boolean)`:  Make a boolean scalar hstore
  + `hstore(text, hstore)`: Make a nested hstore

Re: [HACKERS] make_interval ??

2013-12-20 Thread Gavin Flower

On 21/12/13 06:29, Josh Berkus wrote:

Pavel,


So constructor should to look like:

CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int
DEFAULT 0, ...)

and usage:

SELECT make_interval(years := 2)
SELECT make_interval(days := 14)

Is there a interest for this (or similar) function?

It would certainly make our Python users happy.

And for that matter would get rid of this kind of stupid thing in stored
procedure code:

time_ahead := ( interval '1 minute' * var_skip );

So, +1 for the feature.


What about leap years?

Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Peter Geoghegan
On Fri, Dec 20, 2013 at 12:39 PM, Heikki Linnakangas
 wrote:
> Hmm. If I understand the problem correctly, it's that as soon as another
> backend sees the tuple you've inserted and calls XactLockTableWait(), it
> will not stop waiting even if we later decide to kill the already-inserted
> tuple.

Forgive me for being pedantic, but I wouldn't describe it that way.
Quite simply, the tuples speculatively inserted (and possibly later
deleted) are functionally value locks, that presently cannot be easily
released (so my point is it doesn't matter if you're currently waiting
on XactLockTableWait() or are just about to). I have to wonder about
the performance implications of fixing this, even if we suppose the
fix is itself inexpensive. The current approach probably benefits from
not having to re-acquire value locks from previous attempts, since
everyone still has to care about "value locks" from our previous
attempts.

The more I think about it, the more opposed I am to letting this
slide, which is an notion I had considered last night, if only because
MySQL did so for many years. This is qualitatively different from
other cases where we deadlock. Even back when we exclusive locked rows
as part of foreign key enforcement, I think it was more or less always
possible to do an analysis of the dependencies that existed, to ensure
that locks were acquired in a predictable order so that deadlocking
could not occur. Now, maybe that isn't practical for an entire app,
but it is practical to do in a localized way as problems emerge. In
contrast, if we allowed unprincipled deadlocking, the only advice we
could give is "stop doing so much upserting".


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Heikki Linnakangas

On 12/20/2013 10:56 PM, Alvaro Herrera wrote:

Robert Haas escribió:

On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas
 wrote:

Hmm. If I understand the problem correctly, it's that as soon as another
backend sees the tuple you've inserted and calls XactLockTableWait(), it
will not stop waiting even if we later decide to kill the already-inserted
tuple.

One approach to fix that would be to release and immediately re-acquire the
transaction-lock, when you kill an already-inserted tuple. Then teach the
callers of XactLockTableWait() to re-check if the tuple is still alive.


That particular mechanism sounds like a recipe for unintended consequences.


Yep, what I thought too.

There are probably other ways to make that general idea work though.  I
didn't follow this thread carefully, but is the idea that there would be
many promise tuples "live" at any one time, or only one?  Because if
there's only one, or a very limited number, it might be workable to
sleep on that tuple's lock instead of the xact's lock.


Only one.

heap_update() and heap_delete() also grab a heavy-weight lock on the 
tuple, before calling XactLockTableWait(). _bt_doinsert() does not, but 
it could. Perhaps we can take advantage of that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory message queues

2013-12-20 Thread Andres Freund
Hi,

On 2013-12-18 15:23:23 -0500, Robert Haas wrote:
> It sounds like most people who have looked at this stuff are broadly
> happy with it, so I'd like to push on toward commit soon, but it'd be
> helpful, Andres, if you could review the comment additions to
> shm-mq-v2.patch and see whether those address your concerns.  If so,
> I'll see about improving the overall comments for shm-toc-v1.patch as
> well to clarify the points that seem to have caused a bit of
> confusion; specific thoughts on what ought to be covered there, or any
> other review, is most welcome.

Some things:

* shm_mq_minimum_size should be const

* the MAXALIGNing in shm_mq_create looks odd. We're computing
  data_offset using MAXALIGN, determine the ring size using it, but then
  set mq_ring_offset to data_offset - offsetof(shm_mq, mq_ring)?

* same problem as the toc stuff with a dynamic number of spinnlocks.

* you don't seem to to initialize the spinlock anywhere. That's clearly
  not ok, independent of the spinlock implementation.

* The comments around shm_mq/shm_mq_handle are a clear improvement. I'd
  be pretty happy if you additionally add someting akin to 'This struct
  describes the shared state of a queue' and 'Backend-local reference to
  a queue'.

* s/MQH_BUFSIZE/MQH_INITIAL_BUFSIZE/, I was already wondering whether
  there's a limit on the max number of bytes.

* I think shm_mq_attach()'s documentation should mention that we'll
  initially and later allocate memory in the current
  CurrentMemoryContext when attaching. That will likely require some
  care for some callers. Yes, it's documented in a bigger comment
  somewhere else, but still.

* I don't really understand the mqh_consume_pending stuff. If we just
  read a message spanning from the beginning to the end of the
  ringbuffer, without wrapping, we might not confirm the read in
  shm_mq_receive() until the next the it is called? Do I understand that
  correctly?
I am talking about the following and other similar pieces of code:
+   if (rb >= needed)
+   {
+   /*
+* Technically, we could consume the message length 
information at
+* this point, but the extra write to shared memory 
wouldn't be
+* free and in most cases we would reap no benefit.
+*/
+   mqh->mqh_consume_pending = needed;
+   *nbytesp = nbytes;
+   *datap = ((char *) rawdata) + 
MAXALIGN64(sizeof(uint64));
+   return SHM_MQ_SUCCESS;
+   }

* ISTM the messages around needing to use the same arguments for
  send/receive again after SHM_MQ_WOULD_BLOCK could stand to be more
  forceful. "In this case, the caller should call this function again
  after the process latch has been set." doesn't make it all that clear
  that failure to do so will corrupt the next message. Maybe there also
  should be an assert checking that the size is the same when retrying
  as when starting a partial read?

* You have some CHECK_FOR_INTERRUPTS(), are you sure the code is safe to
  longjmp out from there in all the cases? E.g. I am not sure it is for
  shm_mq_send_bytes(), when we've first written a partial message, done
  a shm_mq_inc_bytes_written() and then go into the available == 0
  branch and jump out.

* Do I understand it correctly that mqh->mqh_counterparty_attached is
  just sort of a cache, and we'll still detect a detached counterparty
  when we're actually sending/receiving?

That's it for now.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
> On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas
>  wrote:
> > Hmm. If I understand the problem correctly, it's that as soon as another
> > backend sees the tuple you've inserted and calls XactLockTableWait(), it
> > will not stop waiting even if we later decide to kill the already-inserted
> > tuple.
> >
> > One approach to fix that would be to release and immediately re-acquire the
> > transaction-lock, when you kill an already-inserted tuple. Then teach the
> > callers of XactLockTableWait() to re-check if the tuple is still alive.
> 
> That particular mechanism sounds like a recipe for unintended consequences.

Yep, what I thought too.

There are probably other ways to make that general idea work though.  I
didn't follow this thread carefully, but is the idea that there would be
many promise tuples "live" at any one time, or only one?  Because if
there's only one, or a very limited number, it might be workable to
sleep on that tuple's lock instead of the xact's lock.

Another thought is to have a different LockTagType that signals a
transaction that's doing the INSERT/ON DUPLICATE thingy, and remote
backends sleep on that instead of the regular transaction lock.  That
different lock type could be released and reacquired as proposed by
Heikki above without danger of unintended consequences.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 3:39 PM, Heikki Linnakangas
 wrote:
> Hmm. If I understand the problem correctly, it's that as soon as another
> backend sees the tuple you've inserted and calls XactLockTableWait(), it
> will not stop waiting even if we later decide to kill the already-inserted
> tuple.
>
> One approach to fix that would be to release and immediately re-acquire the
> transaction-lock, when you kill an already-inserted tuple. Then teach the
> callers of XactLockTableWait() to re-check if the tuple is still alive.

That particular mechanism sounds like a recipe for unintended consequences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-20 Thread Heikki Linnakangas

On 12/20/2013 06:06 AM, Peter Geoghegan wrote:

On Wed, Dec 18, 2013 at 8:39 PM, Peter Geoghegan  wrote:

Empirically, retrying because ExecInsertIndexTuples() returns some
recheckIndexes occurs infrequently, so maybe that makes all of this
okay. Or maybe it happens infrequently *because* we don't give up on
insertion when it looks like the current iteration is futile. Maybe
just inserting into every unique index, and then blocking on an xid
within ExecCheckIndexConstraints(), works out fairly and performs
reasonably in all common cases. It's pretty damn subtle, though, and I
worry about the worst case performance, and basic correctness issues
for these reasons.


I realized that it's possible to create the problem that I'd
previously predicted with "promise tuples" [1] some time ago, that are
similar in some regards to what Heikki has here. At the time, Robert
seemed to agree that this was a concern [2].

I have a very simple testcase attached, much simpler that previous
testcases, that reproduces deadlock for the patch
exclusion_insert_on_dup.2013_12_12.patch.gz at scale=1 frequently, and
occasionally when scale=10 (for tiny, single-statement transactions).
With scale=100, I can't get it to deadlock on my laptop (60 clients in
all cases), at least in a reasonable time period. With the patch
btreelock_insert_on_dup.2013_12_12.patch.gz, it will never deadlock,
even with scale=1, simply because value locks are not held on to
across row locking. This is why I characterized the locking as
"opportunistic" on several occasions in months past.

The test-case is actually much simpler than the one I describe in [1],
and much simpler than all previous test-cases, as there is only one
unique index, though the problem is essentially the same. It is down
to old "value locks" held across retries - with "exclusion_...", we
can't *stop* locking things from previous locking attempts (where a
locking attempt is btree insertion with the UNIQUE_CHECK_PARTIAL
flag), because dirty snapshots still see
inserted-then-deleted-in-other-xact tuples. This deadlocking seems
unprincipled and unjustified, which is a concern that I had all along,
and a concern that Heikki seemed to share more recently [3]. This is
why I felt strongly all along that value locks ought to be cheap to
both acquire and _release_, and it's unfortunate that so much time was
wasted on tangential issues, though I do accept some responsibility
for that.


Hmm. If I understand the problem correctly, it's that as soon as another 
backend sees the tuple you've inserted and calls XactLockTableWait(), it 
will not stop waiting even if we later decide to kill the 
already-inserted tuple.


One approach to fix that would be to release and immediately re-acquire 
the transaction-lock, when you kill an already-inserted tuple. Then 
teach the callers of XactLockTableWait() to re-check if the tuple is 
still alive. I'm just waving hands here, but the general idea is to 
somehow wake up others when you kill the tuple.


We could make use of that facility to also let others to proceed, if you 
delete a tuple in the same transaction that you insert it. It's a corner 
case, not worth much on its own, but I think it would fall out of the 
above machinery for free, and be an easier way to test it than inducing 
deadlocks with ON DUPLICATE.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alexander Korotkov
On Fri, Dec 20, 2013 at 11:36 PM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:
>
>> On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:
>>
>>> On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
>>>
 On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
 >>>
> wrote:
>

  On 12/12/2013 06:44 PM, Alexander Korotkov wrote:
>
>   When values are packed into small groups, we have to either insert
>
>> inefficiently encoded value or re-encode whole right part of values.
>>
>
> It would probably be simplest to store newly inserted items
> uncompressed,
> in a separate area in the page. For example, grow the list of
> uncompressed
> items downwards from pg_upper, and the compressed items upwards from
> pg_lower. When the page fills up, re-encode the whole page.
>

>>> I hacked together an implementation of a variant of Simple9, to see how
>>> it performs. Insertions are handled per the above scheme.
>>>
>>
>> Here's an updated version of that, using the page layout without
>> item-indexes that I described in the other post. This is much less buggy
>> than that earlier crude version I posted - and unfortunately it doesn't
>> compress as well. The earlier version lost some items :-(.
>>
>> Nevertheless, I think this page layout and code formatting is better,
>> even if we switch the encoding back to the varbyte encoding in the end.
>>
>
> Yet another version. The encoding/decoding code is now quite isolated in
> ginpostinglist.c, so it's easy to experiment with different encodings. This
> patch uses varbyte encoding again.
>
> I got a bit carried away, experimented with a bunch of different
> encodings. I tried rice encoding, rice encoding with block and offset
> number delta stored separately, the simple9 variant, and varbyte encoding.
>
> The compressed size obviously depends a lot on the distribution of the
> items, but in the test set I used, the differences between different
> encodings were quite small.
>
> One fatal problem with many encodings is VACUUM. If a page is completely
> full and you remove one item, the result must still fit. In other words,
> removing an item must never enlarge the space needed. Otherwise we have to
> be able to split on vacuum, which adds a lot of code, and also makes it
> possible for VACUUM to fail if there is no disk space left. That's
> unpleasant if you're trying to run VACUUM to release disk space. (gin fast
> updates already has that problem BTW, but let's not make it worse)
>
> I believe that eliminates all encodings in the Simple family, as well as
> PForDelta, and surprisingly also Rice encoding. For example, if you have
> three items in consecutive offsets, the differences between them are
> encoded as 11 in rice encoding. If you remove the middle item, the encoding
> for the next item becomes 010, which takes more space than the original.
>
> AFAICS varbyte encoding is safe from that. (a formal proof would be nice
> though).
>

Formal proof is so. Removing number is actually replacement of two numbers
with their sum. We have to proof that varbyte encoding of sum can't be
longer than varbyte encoding of summands.High bit number of sum is at most
one more than high bit number of larger summand. So varbyte encoding of sum
is at most one byte longer than varbyte encoding of larger summand. Lesser
summand is at least one byte.


> So, I'm happy to go with varbyte encoding now, indeed I don't think we
> have much choice unless someone can come up with an alternative that's
> VACUUM-safe. I have to put this patch aside for a while now, I spent a lot
> more time on these encoding experiments than I intended. If you could take
> a look at this latest version, spend some time reviewing it and cleaning up
> any obsolete comments, and re-run the performance tests you did earlier,
> that would be great. One thing I'm slightly worried about is the overhead
> of merging the compressed and uncompressed posting lists in a scan. This
> patch will be in good shape for the final commitfest, or even before that.


OK. I'll make tests for scans on mix of compressed and uncompressed posting
lists. However, I don't expect it to be slower than both pure compressed
and pure uncompressed posting lists. Overhead of compressing uncompressed
posting lists is evident. But it also would be nice to measure.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alvaro Herrera
Alexander Korotkov escribió:
> On Fri, Dec 20, 2013 at 11:43 PM, Alvaro Herrera
> wrote:
> 
> > Heikki Linnakangas escribió:
> >
> > > I believe that eliminates all encodings in the Simple family, as
> > > well as PForDelta, and surprisingly also Rice encoding. For example,
> > > if you have three items in consecutive offsets, the differences
> > > between them are encoded as 11 in rice encoding. If you remove the
> > > middle item, the encoding for the next item becomes 010, which takes
> > > more space than the original.
> >
> > I don't understand this.  If you have three consecutive entries, and the
> > differences between them are 11, you need to store two 11s.  But if you
> > have two items, you only need to store 010 once.  So the difference is
> > larger, but since you need to store only one of them then overall it's
> > still shorter than the original.  No?
> 
> I believe Heikki mean both differences are encoded as 11, each one is 1.

Oh, that sucks (or it's great, depending on perspective).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 2:17 PM, Alvaro Herrera
 wrote:
> Robert Haas escribió:
>> On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
>>  wrote:
>
>> > I assume without checking that passing reloid/ctid would allow this to
>> > work for tuples in a RETURNING clause; and if we ever have an OLD
>> > reference for the RETURNING clause of an UPDATE, that it would work
>> > there, too, showing the post-update status of the updated tuple.
>>
>> I don't understand what you're saying here.  Are you saying that
>> reloid/ctid is a better approach, a worse approach, or just a
>> different approach?
>
> That probably wasn't worded very well.  I am just saying that whatever
> approach we end up with, it would be nice that it worked somehow with
> RETURNING clauses.

Hmm.  It's not evident to me why that particular case would be any
different than anything else, but that might be my ignorance showing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory message queues

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 2:33 PM, Andres Freund  wrote:
> On 2013-12-20 14:10:57 -0500, Robert Haas wrote:
>> > Since you're embedding spinlocks in struct shm_toc, this module will be
>> > in conflict with platforms that do --disable-spinlocks, since the number
>> > of spinlocks essentially needs to be predetermined there. I personally
>> > still think the solution to that is getting rid of --disable-spinlocks.
>>
>> Yep.  We can either deprecate --disable-spinlocks, or we can add an
>> API that is the reverse of S_INIT_LOCK().
>
> How is that going to help? Even if we'd deallocate unused spinlocks -
> which sure is a good idea when they require persistent resources like
> the PGSemaphore based one - the maximum is still going to be there.

Well, we can set the maximum to a bigger number, if that's what we
want to do, but I'll admit it's unclear what value would be
sufficient.  We probably won't have more than one TOC per DSM, and the
maximum number of DSMs is capped based on MaxBackends, but it seems
likely that many applications of dynamic shared memory will require
spinlocks, and I don't think we can plausibly calculate an upper bound
there.  If we could it would be a big number, and that might just make
startup fail anyway.

Personally, I don't have a big problem with the idea that if you
compile with --disable-spinlocks, some things may just fail, and
parallel whatever might be one of those things.  We could just bump
the "30" in SpinlockSemas to "100", and if you happen to use more than
that, too bad for you: it'll fail.

But I also don't have a big problem with removing --disable-spinlocks
altogether.  What do other people think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alexander Korotkov
On Fri, Dec 20, 2013 at 11:43 PM, Alvaro Herrera
wrote:

> Heikki Linnakangas escribió:
>
> > I believe that eliminates all encodings in the Simple family, as
> > well as PForDelta, and surprisingly also Rice encoding. For example,
> > if you have three items in consecutive offsets, the differences
> > between them are encoded as 11 in rice encoding. If you remove the
> > middle item, the encoding for the next item becomes 010, which takes
> > more space than the original.
>
> I don't understand this.  If you have three consecutive entries, and the
> differences between them are 11, you need to store two 11s.  But if you
> have two items, you only need to store 010 once.  So the difference is
> larger, but since you need to store only one of them then overall it's
> still shorter than the original.  No?


I believe Heikki mean both differences are encoded as 11, each one is 1.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
> On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
>  wrote:

> > I assume without checking that passing reloid/ctid would allow this to
> > work for tuples in a RETURNING clause; and if we ever have an OLD
> > reference for the RETURNING clause of an UPDATE, that it would work
> > there, too, showing the post-update status of the updated tuple.
> 
> I don't understand what you're saying here.  Are you saying that
> reloid/ctid is a better approach, a worse approach, or just a
> different approach?

That probably wasn't worded very well.  I am just saying that whatever
approach we end up with, it would be nice that it worked somehow with
RETURNING clauses.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alvaro Herrera
Heikki Linnakangas escribió:

> I believe that eliminates all encodings in the Simple family, as
> well as PForDelta, and surprisingly also Rice encoding. For example,
> if you have three items in consecutive offsets, the differences
> between them are encoded as 11 in rice encoding. If you remove the
> middle item, the encoding for the next item becomes 010, which takes
> more space than the original.

I don't understand this.  If you have three consecutive entries, and the
differences between them are 11, you need to store two 11s.  But if you
have two items, you only need to store 010 once.  So the difference is
larger, but since you need to store only one of them then overall it's
still shorter than the original.  No?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging WAL when updating hintbit

2013-12-20 Thread Heikki Linnakangas

On 12/20/2013 08:34 PM, Fujii Masao wrote:

On Fri, Dec 20, 2013 at 2:05 PM, Sawada Masahiko  wrote:

On Fri, Dec 20, 2013 at 3:38 AM, Sawada Masahiko  wrote:

I attached the patch which changes name from 'wal_log_hintbits' to
'wal_log_hints'.
It gained the approval of plural.


Sorry the patch which I attached has wrong indent on pg_controldata.
I have modified it and attached the new version patch.


Thanks! Committed.


Thanks. I was hoping someone would come up with an even better name, but 
since no-one did, I agree that's better.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Heikki Linnakangas

On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:

On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:

On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas

wrote:



On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

  When values are packed into small groups, we have to either insert

inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items
uncompressed,
in a separate area in the page. For example, grow the list of
uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.


I hacked together an implementation of a variant of Simple9, to see how
it performs. Insertions are handled per the above scheme.


Here's an updated version of that, using the page layout without
item-indexes that I described in the other post. This is much less buggy
than that earlier crude version I posted - and unfortunately it doesn't
compress as well. The earlier version lost some items :-(.

Nevertheless, I think this page layout and code formatting is better,
even if we switch the encoding back to the varbyte encoding in the end.


Yet another version. The encoding/decoding code is now quite isolated in 
ginpostinglist.c, so it's easy to experiment with different encodings. 
This patch uses varbyte encoding again.


I got a bit carried away, experimented with a bunch of different 
encodings. I tried rice encoding, rice encoding with block and offset 
number delta stored separately, the simple9 variant, and varbyte encoding.


The compressed size obviously depends a lot on the distribution of the 
items, but in the test set I used, the differences between different 
encodings were quite small.


One fatal problem with many encodings is VACUUM. If a page is completely 
full and you remove one item, the result must still fit. In other words, 
removing an item must never enlarge the space needed. Otherwise we have 
to be able to split on vacuum, which adds a lot of code, and also makes 
it possible for VACUUM to fail if there is no disk space left. That's 
unpleasant if you're trying to run VACUUM to release disk space. (gin 
fast updates already has that problem BTW, but let's not make it worse)


I believe that eliminates all encodings in the Simple family, as well as 
PForDelta, and surprisingly also Rice encoding. For example, if you have 
three items in consecutive offsets, the differences between them are 
encoded as 11 in rice encoding. If you remove the middle item, the 
encoding for the next item becomes 010, which takes more space than the 
original.


AFAICS varbyte encoding is safe from that. (a formal proof would be nice 
though).


So, I'm happy to go with varbyte encoding now, indeed I don't think we 
have much choice unless someone can come up with an alternative that's 
VACUUM-safe. I have to put this patch aside for a while now, I spent a 
lot more time on these encoding experiments than I intended. If you 
could take a look at this latest version, spend some time reviewing it 
and cleaning up any obsolete comments, and re-run the performance tests 
you did earlier, that would be great. One thing I'm slightly worried 
about is the overhead of merging the compressed and uncompressed posting 
lists in a scan. This patch will be in good shape for the final 
commitfest, or even before that.


- Heikki


gin-packed-postinglists-varbyte2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory message queues

2013-12-20 Thread Andres Freund
On 2013-12-20 14:10:57 -0500, Robert Haas wrote:
> > Since you're embedding spinlocks in struct shm_toc, this module will be
> > in conflict with platforms that do --disable-spinlocks, since the number
> > of spinlocks essentially needs to be predetermined there. I personally
> > still think the solution to that is getting rid of --disable-spinlocks.
> 
> Yep.  We can either deprecate --disable-spinlocks, or we can add an
> API that is the reverse of S_INIT_LOCK().

How is that going to help? Even if we'd deallocate unused spinlocks -
which sure is a good idea when they require persistent resources like
the PGSemaphore based one - the maximum is still going to be there.

> > I vote for removing all the shm_toc_estimator() knowledge from the
> > header, there seems little reason to expose it that way. That just
> > exposes unneccessary details and makes fixes after releases harder
> > (requiring extensions to recompile). Function call costs hardly can
> > matter, right?
> 
> Oh, you're saying to make it a function rather than a macro?  Sure, I
> could do that.

Yea, keeping it private, as a function, seems like a good idea.

> > Do you assume that lookup speed isn't that important? I have a bit of a
> > hard time imagining that linear search over the keys isn't going to bite
> > us. At the least there should be a comment somewhere documenting that
> > the indented usecase is having a relatively few keys.
> 
> Well, as previously noted, in the use cases I imagine for this, it'll
> be nworkers + somesmallconstant.  I can add a comment to that effect.

I primarily was wondering because you have gone through a bit of effort
making it lockless. A comment would be good, yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: option --if-exists for pg_dump

2013-12-20 Thread Pavel Stehule
Hello

next version

pg_restore knows --if-exists option now

Regards

Pavel Stehule


2013/12/13 Pavel Stehule 

> Hello
>
> I am sending a rebased patch.
>
> Now dump generated with --if-exists option is readable by pg_restore
>
> Regards
>
> Pavel
>
commit 7c79aa77ccf53252bac8ce2302a7a0f7a1942e9b
Author: Pavel Stehule 
Date:   Fri Dec 20 20:13:00 2013 +0100

inital

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8d45f24..18f7346 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -650,6 +650,16 @@ PostgreSQL documentation
  
 
  
+  --if-exists
+  
+   
+It use conditional commands (with IF EXISTS
+clause) for cleaning database schema.
+   
+  
+ 
+
+ 
   --disable-dollar-quoting
   

diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 5c6a101..670539e 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -301,6 +301,16 @@ PostgreSQL documentation
  
 
  
+  --if-exists
+  
+   
+It use conditional commands (with IF EXISTS
+clause) for cleaning database schema.
+   
+  
+ 
+
+ 
   --inserts
   

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 717da42..dd8e990 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -490,6 +490,16 @@
  
 
  
+  --if-exists
+  
+   
+It use conditional commands (with IF EXISTS
+clause) for cleaning database schema.
+   
+  
+ 
+
+ 
   --no-data-for-failed-tables
   

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 6927968..83f7216 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -113,6 +113,7 @@ typedef struct _restoreOptions
 	char	   *superuser;		/* Username to use as superuser */
 	char	   *use_role;		/* Issue SET ROLE to this */
 	int			dropSchema;
+	int			if_exists;
 	const char *filename;
 	int			dataOnly;
 	int			schemaOnly;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 63a8009..327ff03 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -55,7 +55,8 @@ static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
 	 const int compression, ArchiveMode mode, SetupWorkerPtr setupWorkerPtr);
 static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
 	  ArchiveHandle *AH);
-static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isData, bool acl_pass);
+static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt,
+ bool isData, bool acl_pass);
 static char *replace_line_endings(const char *str);
 static void _doSetFixedOutputState(ArchiveHandle *AH);
 static void _doSetSessionAuth(ArchiveHandle *AH, const char *user);
@@ -234,6 +235,7 @@ RestoreArchive(Archive *AHX)
 	bool		parallel_mode;
 	TocEntry   *te;
 	OutputContext sav;
+	
 
 	AH->stage = STAGE_INITIALIZING;
 
@@ -409,8 +411,24 @@ RestoreArchive(Archive *AHX)
 /* Select owner and schema as necessary */
 _becomeOwner(AH, te);
 _selectOutputSchema(AH, te->namespace);
-/* Drop it */
-ahprintf(AH, "%s", te->dropStmt);
+
+if (*te->dropStmt != '\0')
+{
+	if (ropt->if_exists)
+	{
+		int		desc_len = strlen(te->desc);
+
+		/* Clause IF EXISTS can be used yet (by pg_dump) */
+		if (strncmp(te->dropStmt + desc_len + 5, " IF EXISTS", 9) != 0)
+			ahprintf(AH, "DROP %s IF EXISTS %s",
+			  te->desc,
+		  te->dropStmt + 6 + desc_len);
+		else
+			ahprintf(AH, "%s", te->dropStmt);
+	}
+	else
+		ahprintf(AH, "%s", te->dropStmt);
+}
 			}
 		}
 
@@ -2938,9 +2956,35 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te, ArchiveHandle *AH)
 		strcmp(type, "OPERATOR CLASS") == 0 ||
 		strcmp(type, "OPERATOR FAMILY") == 0)
 	{
-		/* Chop "DROP " off the front and make a modifiable copy */
-		char	   *first = pg_strdup(te->dropStmt + 5);
-		char	   *last;
+		char	*first;
+		char	*last;
+
+		/* try to search IF EXISTS in DROP command */
+		if (strstr(te->dropStmt, "IF EXISTS") != NULL)
+		{
+			char buffer[40];
+			size_t   l;
+
+			/* IF EXISTS clause should be optional, check it*/
+			snprintf(buffer, sizeof(buffer), "DROP %s IF EXISTS", type);
+			l = strlen(buffer);
+
+			if (strncmp(te->dropStmt, buffer, l) == 0)
+			{
+/* append command type to target type */
+appendPQExpBufferStr(buf, type);
+
+/* skip first n chars, and create a modifieble copy */
+first = pg_strdup(te->dropStmt + l);
+			}
+			else
+/* DROP IF EXISTS pattern is not appliable on dropStmt */
+first = pg_strdup(te->dropStmt + 5);
+		}
+
+		else
+			/* IF EXISTS clause was not used, simple solution */
+	

Re: [HACKERS] Logging WAL when updating hintbit

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 9:06 AM, Alvaro Herrera
 wrote:
> Michael Paquier escribió:
>> On Fri, Dec 20, 2013 at 1:05 PM, Sawada Masahiko  
>> wrote:
>
>> > Sorry the patch which I attached has wrong indent on pg_controldata.
>> > I have modified it and attached the new version patch.
>> Now that you send this patch, I am just recalling some recent email
>> from Tom arguing about avoiding to mix lower and upper-case characters
>> for a GUC parameter name:
>> http://www.postgresql.org/message-id/30569.1384917...@sss.pgh.pa.us
>>
>> To fullfill this requirement, could you replace walLogHints by
>> wal_log_hints in your patch? Thoughts from others?
>
> The issue is with the user-visible variables, not with internal
> variables implementing them.  I think the patch is sane.  (Other than
> the fact that it was posted as a patch-on-patch instead of
> patch-on-master).

But spelling it the same way everywhere really improves greppability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
 wrote:
> Robert Haas escribió:
>> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund  
>> wrote:
>> >> Maybe what we should do is add a function something like
>> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
>> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
>> >> int, hoff int).  Or perhaps some slightly more cooked version of that
>> >> information.  And then delete the xmin, xmax, cmin, and cmax system
>> >> columns.  That'd save significantly on pg_attribute entries while, at
>> >> the same time, actually providing more information than we do today.
>> >
>> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
>> > row, instead of having the user manually pass in reloid and ctid. I
>> > think that should actually work in the interesting scenarios.
>>
>> I wondered that, too, but it's not well-defined for all tuples.  What
>> happens if you pass in constructed tuple rather than an on-disk tuple?
>
> I assume without checking that passing reloid/ctid would allow this to
> work for tuples in a RETURNING clause; and if we ever have an OLD
> reference for the RETURNING clause of an UPDATE, that it would work
> there, too, showing the post-update status of the updated tuple.

I don't understand what you're saying here.  Are you saying that
reloid/ctid is a better approach, a worse approach, or just a
different approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory message queues

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 1:11 PM, Andres Freund  wrote:
> On 2013-10-31 12:21:31 -0400, Robert Haas wrote:
>> Patch #2, shm-toc-v1.patch, provides a facility for sizing a dynamic
>> shared memory segment before creation, and for dividing it up into
>> chunks after it's been created.  It therefore serves a function quite
>> similar to RequestAddinShmemSpace, except of course that there is only
>> one main shared memory segment created at postmaster startup time,
>> whereas new dynamic shared memory segments can come into existence on
>> the fly; and it serves even more conspicuously the function of
>> ShmemIndex, which enables backends to locate particular data
>> structures within the shared memory segment.  It is however quite a
>> bit simpler than the ShmemIndex mechanism: we don't need the same
>> level of extensibility here that we do for the main shared memory
>> segment, because a new extension need not piggyback on an existing
>> dynamic shared memory segment, but can create a whole segment of its
>> own.
>
> So, without the argument of having per-extension dsm segments, I'd say
> that a purely integer key sucks, because it's hard to manage and
> debug. This way it's still not too nice, but I don't see a all that good
> alternative.
>
> Comments about shm-toc-v1.patch:
>
> Since you're embedding spinlocks in struct shm_toc, this module will be
> in conflict with platforms that do --disable-spinlocks, since the number
> of spinlocks essentially needs to be predetermined there. I personally
> still think the solution to that is getting rid of --disable-spinlocks.

Yep.  We can either deprecate --disable-spinlocks, or we can add an
API that is the reverse of S_INIT_LOCK().

> I vote for removing all the shm_toc_estimator() knowledge from the
> header, there seems little reason to expose it that way. That just
> exposes unneccessary details and makes fixes after releases harder
> (requiring extensions to recompile). Function call costs hardly can
> matter, right?

Oh, you're saying to make it a function rather than a macro?  Sure, I
could do that.

> Do you assume that lookup speed isn't that important? I have a bit of a
> hard time imagining that linear search over the keys isn't going to bite
> us. At the least there should be a comment somewhere documenting that
> the indented usecase is having a relatively few keys.

Well, as previously noted, in the use cases I imagine for this, it'll
be nworkers + somesmallconstant.  I can add a comment to that effect.

> Wouldn't it make sense to have a function that does the combined job of
> shm_toc_insert() and shm_toc_allocate()?

We could, but I don't really feel compelled.  It's not hard to call
them individually if that's the behavior you happen to want.

> What's the assumption about the use of the magic in create/attach? Just
> statically defined things in user code?

Yeah.

> Ok, cooking now, then I'll have a look at the queue itself,

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund  wrote:
> >> Maybe what we should do is add a function something like
> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> >> int, hoff int).  Or perhaps some slightly more cooked version of that
> >> information.  And then delete the xmin, xmax, cmin, and cmax system
> >> columns.  That'd save significantly on pg_attribute entries while, at
> >> the same time, actually providing more information than we do today.
> >
> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
> > row, instead of having the user manually pass in reloid and ctid. I
> > think that should actually work in the interesting scenarios.
> 
> I wondered that, too, but it's not well-defined for all tuples.  What
> happens if you pass in constructed tuple rather than an on-disk tuple?

I assume without checking that passing reloid/ctid would allow this to
work for tuples in a RETURNING clause; and if we ever have an OLD
reference for the RETURNING clause of an UPDATE, that it would work
there, too, showing the post-update status of the updated tuple.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging WAL when updating hintbit

2013-12-20 Thread Fujii Masao
On Fri, Dec 20, 2013 at 2:05 PM, Sawada Masahiko  wrote:
> On Fri, Dec 20, 2013 at 3:38 AM, Sawada Masahiko  
> wrote:
>> On Thu, Dec 19, 2013 at 12:37 PM, Amit Kapila  
>> wrote:
>>> On Wed, Dec 18, 2013 at 11:30 AM, Michael Paquier
>>>  wrote:
 On Wed, Dec 18, 2013 at 11:22 AM, Amit Kapila  
 wrote:
> On Fri, Dec 13, 2013 at 7:57 PM, Heikki Linnakangas
>  wrote:
>> Thanks, committed with some minor changes:
>
> Should this patch in CF app be moved to Committed Patches or is there
> something left for this patch?
 Nothing has been forgotten for this patch. It can be marked as committed.
>>>
>>> Thanks for confirmation, I have marked it as Committed.
>>>
>>
>> Thanks!
>>
>> I attached the patch which changes name from 'wal_log_hintbits' to
>> 'wal_log_hints'.
>> It gained the approval of plural.
>>
>
> Sorry the patch which I attached has wrong indent on pg_controldata.
> I have modified it and attached the new version patch.

Thanks! Committed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory message queues

2013-12-20 Thread Andres Freund
On 2013-10-31 12:21:31 -0400, Robert Haas wrote:
> Patch #2, shm-toc-v1.patch, provides a facility for sizing a dynamic
> shared memory segment before creation, and for dividing it up into
> chunks after it's been created.  It therefore serves a function quite
> similar to RequestAddinShmemSpace, except of course that there is only
> one main shared memory segment created at postmaster startup time,
> whereas new dynamic shared memory segments can come into existence on
> the fly; and it serves even more conspicuously the function of
> ShmemIndex, which enables backends to locate particular data
> structures within the shared memory segment.  It is however quite a
> bit simpler than the ShmemIndex mechanism: we don't need the same
> level of extensibility here that we do for the main shared memory
> segment, because a new extension need not piggyback on an existing
> dynamic shared memory segment, but can create a whole segment of its
> own.

So, without the argument of having per-extension dsm segments, I'd say
that a purely integer key sucks, because it's hard to manage and
debug. This way it's still not too nice, but I don't see a all that good
alternative.

Comments about shm-toc-v1.patch:

Since you're embedding spinlocks in struct shm_toc, this module will be
in conflict with platforms that do --disable-spinlocks, since the number
of spinlocks essentially needs to be predetermined there. I personally
still think the solution to that is getting rid of --disable-spinlocks.

I vote for removing all the shm_toc_estimator() knowledge from the
header, there seems little reason to expose it that way. That just
exposes unneccessary details and makes fixes after releases harder
(requiring extensions to recompile). Function call costs hardly can
matter, right?

Do you assume that lookup speed isn't that important? I have a bit of a
hard time imagining that linear search over the keys isn't going to bite
us. At the least there should be a comment somewhere documenting that
the indented usecase is having a relatively few keys.

Wouldn't it make sense to have a function that does the combined job of
shm_toc_insert() and shm_toc_allocate()?

What's the assumption about the use of the magic in create/attach? Just
statically defined things in user code?

Ok, cooking now, then I'll have a look at the queue itself,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XML Issue with DTDs

2013-12-20 Thread Robert Haas
On Thu, Dec 19, 2013 at 6:40 PM, Florian Pflug  wrote:
> While looking into ways to implement a XMLSTRIP function which extracts the 
> textual contents of an XML value and de-escapes them (i.e. > Solving this 
> seems a bit messy, unfortunately. First, I think we need to have some 
> XMLOPTION value which is a superset of all the others - otherwise, dump & 
> restore won't work reliably. That means either allowing DTDs if XMLOPTION is 
> CONTENT, or inventing a third XMLOPTION, say ANY.

Or we can just decide that it was a bug that this was ever allowed,
and if you upgrade to $FIXEDVERSION you'll need to sanitize your data.
 This is roughly what we did with encoding checks.

> We then need to ensure that combining XML values yields something that is 
> valid according to the most general XMLOPTION setting. That means either
>
> (1) Removing the DTD from all but the first argument to XMLCONCAT, and 
> similarly all but the first value passed to XMLAGG
>
> or
>
> (2) Complaining if these values contain a DTD.
>
> or
>
> (3) Allowing multiple DTDs in a document if XMLOPTION is, say, ANY.
>
> I'm not in favour of (3), since clients are unlikely to be able to process 
> such a value. (1) matches how we currently handle XML declarations ( …?>), so I'm slightly in favour of that.

I don't like #3, mostly because I don't like XMLOPTION ANY in the
first place.  Either #1 or #2 sounds OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make_interval ??

2013-12-20 Thread Josh Berkus
Pavel,

> So constructor should to look like:
> 
> CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int
> DEFAULT 0, ...)
> 
> and usage:
> 
> SELECT make_interval(years := 2)
> SELECT make_interval(days := 14)
> 
> Is there a interest for this (or similar) function?

It would certainly make our Python users happy.

And for that matter would get rid of this kind of stupid thing in stored
procedure code:

time_ahead := ( interval '1 minute' * var_skip );

So, +1 for the feature.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Make various variables read-only (const)

2013-12-20 Thread Oskari Saarenmaa
This allows the variables to be moved from .data to .rodata section which
means that more data can be shared by processes and makes sure that nothing
can accidentally overwrite the read-only definitions.  On a x86-64 Linux
system this moves roughly 9 kilobytes of previously writable data to the
read-only data segment in the backend and 4 kilobytes in libpq.

https://github.com/saaros/postgres/compare/constify

24 files changed, 108 insertions(+), 137 deletions(-)

/ Oskari
>From 89f0348747b356eaccf5edc2f85bf47d0a35c4f4 Mon Sep 17 00:00:00 2001
From: Oskari Saarenmaa 
Date: Fri, 20 Dec 2013 18:38:10 +0200
Subject: [PATCH] Make various variables read-only (const)

This allows the variables to be moved from .data to .rodata section which
means that more data can be shared by processes and makes sure that nothing
can accidentally overwrite the read-only definitions.  On a x86-64 Linux
system this moves roughly 9 kilobytes of previously writable data to the
read-only data segment in the backend and 4 kilobytes in libpq.
---
 src/backend/access/common/reloptions.c | 31 
 src/backend/catalog/objectaddress.c|  4 +-
 src/backend/commands/conversioncmds.c  |  2 +-
 src/backend/commands/createas.c|  3 +-
 src/backend/commands/tablecmds.c   |  8 ++--
 src/backend/optimizer/path/costsize.c  |  2 +-
 src/backend/regex/regc_lex.c   |  4 +-
 src/backend/regex/regcomp.c|  2 +-
 src/backend/regex/regerror.c   |  6 +--
 src/backend/tcop/utility.c |  3 +-
 src/backend/tsearch/wparser_def.c  |  4 +-
 src/backend/utils/adt/datetime.c   |  6 +--
 src/backend/utils/adt/formatting.c | 67 +++---
 src/backend/utils/adt/tsrank.c | 16 
 src/backend/utils/mb/encnames.c| 31 
 src/backend/utils/mb/mbutils.c |  8 ++--
 src/backend/utils/mb/wchar.c   |  2 +-
 src/common/relpath.c   |  8 ++--
 src/include/access/reloptions.h|  5 +--
 src/include/common/relpath.h   |  3 +-
 src/include/mb/pg_wchar.h  | 18 -
 src/include/optimizer/cost.h   |  2 +-
 src/include/utils/datetime.h   |  2 +
 src/port/chklocale.c   |  8 ++--
 24 files changed, 108 insertions(+), 137 deletions(-)

diff --git a/src/backend/access/common/reloptions.c 
b/src/backend/access/common/reloptions.c
index 31941e9..5ec617b 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -564,18 +564,17 @@ add_string_reloption(bits32 kinds, char *name, char 
*desc, char *default_val,
  * If ignoreOids is true, then we should ignore any occurrence of "oids"
  * in the list (it will be or has been handled by interpretOidsOption()).
  *
- * Note that this is not responsible for determining whether the options
- * are valid, but it does check that namespaces for all the options given are
- * listed in validnsps.  The NULL namespace is always valid and need not be
- * explicitly listed.  Passing a NULL pointer means that only the NULL
- * namespace is valid.
+ * Note that this is not responsible for determining whether the options are
+ * valid, but it does check that namespaces for all the options given
+ * matches validnsp.  The NULL namespace is always valid.  Passing a NULL
+ * valinsp means that only the NULL namespace is valid.
  *
  * Both oldOptions and the result are text arrays (or NULL for "default"),
  * but we declare them as Datums to avoid including array.h in reloptions.h.
  */
 Datum
-transformRelOptions(Datum oldOptions, List *defList, char *namspace,
-   char *validnsps[], bool ignoreOids, 
bool isReset)
+transformRelOptions(Datum oldOptions, List *defList, const char *namspace,
+   const char *validnsp, bool ignoreOids, 
bool isReset)
 {
Datum   result;
ArrayBuildState *astate;
@@ -667,23 +666,7 @@ transformRelOptions(Datum oldOptions, List *defList, char 
*namspace,
 */
if (def->defnamespace != NULL)
{
-   boolvalid = false;
-   int i;
-
-   if (validnsps)
-   {
-   for (i = 0; validnsps[i]; i++)
-   {
-   if 
(pg_strcasecmp(def->defnamespace,
-   
  validnsps[i]) == 0)
-   {
-   valid = true;
-   break;
-   }
-   }
-   }
-
-  

Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Heikki Linnakangas

On 12/19/2013 10:44 AM, Heikki Linnakangas wrote:

On 12/19/2013 08:37 AM, Oleg Bartunov wrote:

Guys,

before digging deep into the art of comp/decomp world I'd like to know
if you familiar with results of
http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and
some newer research ?


Yeah, I saw that paper.


Do we agree in what we really want ? Basically,
there are three main features: size, compression, decompression speed
- we should take two :)


According to that Zhang et al paper you linked, the Vbyte method
actually performs the worst on all of those measures. The other
algorithms are quite similar in terms of size (PForDelta being the most
efficient), while PForDelta is significantly faster to compress/decompress.

Just by looking at those numbers, PForDelta looks like a clear winner.
However, it operates on much bigger batches than the other algorithms; I
haven't looked at it in detail, but Zhang et al used 128 integer
batches, and they say that 32 integers is the minimum batch size. If we
want to use it for the inline posting lists stored in entry tuples, that
would be quite wasteful if there are only a few item pointers on the tuple.

Also, in the tests I've run, the compression/decompression speed is not
a significant factor in total performance, with either varbyte encoding
or Simple9-like encoding I hacked together.


One disadvantage of Simple9 and other encodings that operate in batches 
is that removing a value from the middle can increase the number of 
bytes required for the remaining values. That's a problem during vacuum; 
it's possible that after vacuuming away one item pointer, the remaining 
items no longer fit on the page.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR during end-of-xact/FATAL

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
> On Thu, Nov 28, 2013 at 10:10 AM, Alvaro Herrera
>  wrote:
> > Robert Haas escribió:
> >> On Wed, Nov 6, 2013 at 9:40 AM, Alvaro Herrera  
> >> wrote:
> >> > Noah Misch wrote:
> >> >> Incomplete list:
> >> >>
> >> >> - If smgrDoPendingDeletes() finds files to delete, mdunlink() and its 
> >> >> callee
> >> >>   relpathbackend() call palloc(); this is true in all supported 
> >> >> branches.  In
> >> >>   9.3, due to commit 279628a0, smgrDoPendingDeletes() itself calls 
> >> >> palloc().
> >> >>   (In fact, it does so even when the pending list is empty -- this is 
> >> >> the only
> >> >>   palloc() during a trivial transaction commit.)  palloc() failure there
> >> >>   yields a PANIC during commit.
> >> >
> >> > I think we should fix this routine to avoid the palloc when not 
> >> > necessary.
> >> > That initial palloc is pointless.
> >
> > Here's a trivial patch we could apply to 9.3 immediately.  Anything else
> > such as the ideas proposed below would require more effort than anyone
> > can probably spend here soon.
> 
> Yeah, this seems like a good thing to do for now.

Pushed, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fw: LSF/MM 2014 Call For Proposals

2013-12-20 Thread Jonathan Corbet
During the direct I/O discussion I'd suggested that somebody from the
PostgreSQL community might want to put in an appearance at the LSFMM
summit in March.  Here's the CFP.  My guess is that a proposal for a
session on avoiding performance regressions for systems like PostgreSQL,
probably crossing all three tracks, would be well received.

Thanks,

jon

Begin forwarded message:

Date: Fri, 20 Dec 2013 09:30:22 +
From: Mel Gorman 
To: linux-s...@vger.kernel.org, linux-...@vger.kernel.org,
linux...@kvack.org, linux-fsde...@vger.kernel.org Cc:
linux-ker...@vger.kernel.org, lsf...@lists.linux-foundation.org Subject:
LSF/MM 2014 Call For Proposals


The annual Linux Storage, Filesystem and Memory Management Summit for
2014 will be held on March 24th and 25th before the Linux Foundation
Collaboration summit at The Meritage Resort, Napa Valley, CA.


http://events.linuxfoundation.org/events/linux-storage-filesystem-and-mm-summit
http://events.linuxfoundation.org/events/collaboration-summit

Note that we are running LSF/MM a little earlier in 2014 than in previous
years.

On behalf of the committee I would like to issue a call for agenda
proposals that are suitable for cross-track discussion as well as more
technical subjects for discussion in the breakout sessions.

1) Suggestions for agenda topics should be sent before January 31st
2014 to:

lsf...@lists.linux-foundation.org

and cc the Linux list or lists that are most interested in it:

ATA: linux-...@vger.kernel.org
FS: linux-fsde...@vger.kernel.org
MM: linux...@kvack.org
SCSI: linux-s...@vger.kernel.org

People who need more time for visa applications should send proposals
before January 15th. The committee will complete the first round of
selections on that date to accommodate applications.

Please remember to tag your subject with [LSF/MM TOPIC] to make it
easier to track. Agenda topics and attendees will be selected by the
program committee, but the final agenda will be formed by consensus of
the attendees on the day.

We will try to cap attendance at around 25-30 per track to facilitate
discussions although the final numbers will depend on the room sizes at
the venue.

2) Requests to attend the summit should be sent to:

lsf...@lists.linux-foundation.org

Please summarize what expertise you will bring to the meeting, and what
you would like to discuss. Please also tag your email with [LSF/MM ATTEND]
so there is less chance of it getting lost in the large mail pile.

Presentations are allowed to guide discussion, but are strongly
discouraged. There will be no recording or audio bridge. However, we expect
that written minutes will be published as we did in previous years

2013:
http://lwn.net/Articles/548089/

2012:
http://lwn.net/Articles/490114/
http://lwn.net/Articles/490501/

2011:
http://lwn.net/Articles/436871/
http://lwn.net/Articles/437066/

3) If you have feedback on last year's meeting that we can use to
improve this year's, please also send that to:

lsf...@lists.linux-foundation.org

Thank you on behalf of the program committee:

Storage:
James Bottomley
Martin K. Petersen

Filesystems:
Trond Myklebust
Jeff Layton
Dave Chinner
Jan Kara
Ted Ts'o

MM:
Rik van Riel
Michel Lespinasse

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bug fix] psql \copy doesn't end if backend is killed

2013-12-20 Thread MauMau

Hello,

I've encountered a bug on PG 9.2 and fixed it for 9.4.  Please find attached 
the patch.  I'd like it to be backported to at least 9.2.



[Problem]
If the backend is terminated with SIGKILL while psql is running "\copy 
table_name from file_name", the \copy didn't end forever.  I expected \copy 
to be cancelled because the corresponding server process vanished.



[Cause]
psql could not get out of the loop below in handleCopyIn():

while (res = PQgetResult(conn), PQresultStatus(res) == PGRES_COPY_IN)
{
   OK = false;
   PQclear(res);

   PQputCopyEnd(pset.db, _("trying to exit copy mode"));
}

This situation is reached as follows:

1. handleCopyIn() calls PQputCopyData().
2. PQputCopyData() calls pqPutMsgEnd().
3. pqPutMsgEnd() calls pqFlush(), which calls pqSendSome().
4. pqSendSome() calls pqReadData().
5. At this moment, the backend is killed with SIGKILL.
6. pqReadData() fails to read the socket, receiving ECONNRESET.  It closes 
the socket.

7. As a result, PQputCopyData() fails in 2.
8. handleCopyIn() then calls PQputCopyEnd().
9. PQputCopyEnd() calls pqPutMsgENd(), which calls pqFlush(), which in turn 
calls pqSendSome().

10. pqSendSome() fails because the socket is not open.
11. As a result, PQputCopyENd() returns an error, leaving conn->asyncStatus 
PGASYNC_COPY_IN.
12. Because conn->asyncStatus remains PGASYNC_COPY_IN, PQgetResult() 
continues to return pgresult whose status is PGRES_COPY_IN.



[Fix]
If the message transmission fails in PQputCopyEnd(), switch 
conn->asyncStatus back to PGASYNC_BUSY.  That causes PQgetResult() to try to 
read data with pqReadData().  pqReadData() fails and PQgetResult() returns 
NULL.  As a consequence, the loop in question terminates.



Regards
MauMau


failed_copy_loops.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging WAL when updating hintbit

2013-12-20 Thread Alvaro Herrera
Michael Paquier escribió:
> On Fri, Dec 20, 2013 at 1:05 PM, Sawada Masahiko  
> wrote:

> > Sorry the patch which I attached has wrong indent on pg_controldata.
> > I have modified it and attached the new version patch.
> Now that you send this patch, I am just recalling some recent email
> from Tom arguing about avoiding to mix lower and upper-case characters
> for a GUC parameter name:
> http://www.postgresql.org/message-id/30569.1384917...@sss.pgh.pa.us
> 
> To fullfill this requirement, could you replace walLogHints by
> wal_log_hints in your patch? Thoughts from others?

The issue is with the user-visible variables, not with internal
variables implementing them.  I think the patch is sane.  (Other than
the fact that it was posted as a patch-on-patch instead of
patch-on-master).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [bug fix] multibyte messages are displayed incorrectly on the client

2013-12-20 Thread Alvaro Herrera
Noah Misch escribió:
> On Tue, Dec 17, 2013 at 01:42:08PM -0500, Bruce Momjian wrote:
> > On Fri, Dec 13, 2013 at 10:41:17PM +0900, MauMau wrote:

> > > [Fix]
> > > Disable message localization during session startup.  In other
> > > words, messages are output in English until the database session is
> > > established.
> > 
> > I think the question is whether the server encoding or English are
> > likely to be better for the average client.  My bet is that the server
> > encoding is more likely correct.
> > 
> > However, you are right that English/ASCII at least will always be
> > viewable, while there are many server/client combinations that will
> > produce unreadable characters.
> > 
> > I would be interested to hear other people's experience with this.
> 
> I don't have a sufficient sense of multilingualism among our users to know
> whether English/ASCII messages would be more useful, on average, than
> localized messages in the server encoding.  Forcing English/ASCII does worsen
> behavior in the frequent situation where client encoding will match server
> encoding.  I lean toward retaining the status quo of delivering localized
> messages in the server encoding.

The problem is that if there's an encoding mismatch, the message might
be impossible to figure out.  If the message is in english, at least it
can be searched for in the web, or something -- the user might even find
a page in which the english error string appears, with a native language
explanation.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-12-20 Thread Alvaro Herrera
Haribabu kommi escribió:

> From the compilation I observed as libpgcommon is linking while building ecpg.
> I tested the same by using psprintf directly in ecpg code.
> 
> I modified the libecpg's Makefile as suggested by you which is attached in 
> the mail,
> Still the errors are occurring. Please let me know is there anything missed?

I don't know what's the cause of the error you're seeing, but IIRC you
can't have a file in src/port depend on src/common functionality (which
psprintf is IIRC).  So you need to fix things up so that the psprintf()
doesn't occur in src/port at all.  Not sure what's the best way to go
about this; perhaps the whole new function should be in src/common; or
perhaps you need part of it in src/port and the bits with the funny
error reporting in src/common, where psprintf can be called without
issue.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:

> Maybe what we should do is add a function something like
> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> int, hoff int).  Or perhaps some slightly more cooked version of that
> information.  And then delete the xmin, xmax, cmin, and cmax system
> columns.  That'd save significantly on pg_attribute entries while, at
> the same time, actually providing more information than we do today.

+1 for this general idea.  I proposed this some time ago and got shot
down because of pageinspect.  I don't know about taking the system
columns out completely -- not sure how much third party code we're going
to break that way, certainly a lot -- but the extra functionality would
be useful nonetheless.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Andres Freund
On 2013-12-20 07:58:46 -0500, Robert Haas wrote:
> I think the immediate problem is to decide whether this patch ought to
> make the xmin column display the result of GetXmin() or GetRawXmin().
> Thoughts on that?

I slightly favor GetRawXmin().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund  wrote:
> On 2013-12-20 07:47:17 -0500, Robert Haas wrote:
>> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund  
>> wrote:
>> >> Maybe what we should do is add a function something like
>> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
>> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
>> >> int, hoff int).  Or perhaps some slightly more cooked version of that
>> >> information.  And then delete the xmin, xmax, cmin, and cmax system
>> >> columns.  That'd save significantly on pg_attribute entries while, at
>> >> the same time, actually providing more information than we do today.
>> >
>> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
>> > row, instead of having the user manually pass in reloid and ctid. I
>> > think that should actually work in the interesting scenarios.
>>
>> I wondered that, too, but it's not well-defined for all tuples.  What
>> happens if you pass in constructed tuple rather than an on-disk tuple?
>
> Those should be discernible I think, t_self/t_tableOid won't be set for
> generated tuples.

Hmm.  That might work.

I think the immediate problem is to decide whether this patch ought to
make the xmin column display the result of GetXmin() or GetRawXmin().
Thoughts on that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Andres Freund
On 2013-12-20 07:47:17 -0500, Robert Haas wrote:
> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund  wrote:
> >> Maybe what we should do is add a function something like
> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> >> int, hoff int).  Or perhaps some slightly more cooked version of that
> >> information.  And then delete the xmin, xmax, cmin, and cmax system
> >> columns.  That'd save significantly on pg_attribute entries while, at
> >> the same time, actually providing more information than we do today.
> >
> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
> > row, instead of having the user manually pass in reloid and ctid. I
> > think that should actually work in the interesting scenarios.
> 
> I wondered that, too, but it's not well-defined for all tuples.  What
> happens if you pass in constructed tuple rather than an on-disk tuple?

Those should be discernible I think, t_self/t_tableOid won't be set for
generated tuples.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund  wrote:
>> Maybe what we should do is add a function something like
>> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
>> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
>> int, hoff int).  Or perhaps some slightly more cooked version of that
>> information.  And then delete the xmin, xmax, cmin, and cmax system
>> columns.  That'd save significantly on pg_attribute entries while, at
>> the same time, actually providing more information than we do today.
>
> I was wondering whether we couldn't just pass pg_tuple_header() a whole
> row, instead of having the user manually pass in reloid and ctid. I
> think that should actually work in the interesting scenarios.

I wondered that, too, but it's not well-defined for all tuples.  What
happens if you pass in constructed tuple rather than an on-disk tuple?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Andres Freund
On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
> I think the root of the problem is that nobody's very eager to add
> more hidden system catalog columns because each one bloats
> pg_attribute significantly.

I think that part is actually solveable - there's no real need for them
to be real columns, present in pg_attribute, things could easily enough be
setup during parse analysis. The bigger problem I see is that adding
more useful columns will cause name clashes, which will probably
prohibit improvements in that direction.

> Maybe what we should do is add a function something like
> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> int, hoff int).  Or perhaps some slightly more cooked version of that
> information.  And then delete the xmin, xmax, cmin, and cmax system
> columns.  That'd save significantly on pg_attribute entries while, at
> the same time, actually providing more information than we do today.

I was wondering whether we couldn't just pass pg_tuple_header() a whole
row, instead of having the user manually pass in reloid and ctid. I
think that should actually work in the interesting scenarios.

> pageinspect is useful, too, but it seems to deal mostly with pages, so
> I'm not sure it's a natural substitute for something like this.

Agreed. I also think that we really need something to investigate
problems in core, not in a contrib module people won't have installed,
especially in bigger setups. That said I plan to either submit some
improvements to pageinspect myself, or convince somebody else to do so
in the not too far away future.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

2013-12-20 Thread MauMau

From: "Fujii Masao" 

! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)

Even when standby_mode is not enabled, we can use cascade replication and
it needs the accumulated WAL files. So I think that 
AllowCascadeReplication()

should be added into this condition.

!   snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
!   XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
!
!   if (restoredFromArchive)

Don't we need to check !StandbyModeRequested and 
!AllowCascadeReplication()

here?


Oh, you are correct.  Okay, done.



!   /*
!* If the latest segment is not archival, but there's 
still a

!* RECOVERYXLOG laying about, get rid of it.
!*/
!   unlink(recoveryPath);   /* ignore any error */

The similar line exists in the lower side of exitArchiveRecovery(), so 
ISTM that

you can refactor that.


That's already done in the previous patch: deletion of RECOVERYXLOG was 
moved into else clause, as in:


-  /*
-   * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
-   * of it.
-   */
-  snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
-  unlink(recoveryPath);  /* ignore any error */
-



Regards
MauMau


wal_increase_in_pitr_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Thu, Dec 19, 2013 at 9:22 PM, Alvaro Herrera
 wrote:
> Jim Nasby escribió:
>> One thing users will lose in this patch is the ability to reliably see if a 
>> tuple is frozen via SQL. Today you can do that just by selecting xmin from 
>> the table.
>>
>> Obviously people don't generally need to do that... but it's one of those 
>> things that when you do need it it's incredibly handy to have... would it be 
>> difficult to expose infomask(2) via SQL, the same way xmin et all are?
>
> It's already exposed via the pageinspect extension.  It's doubtful that
> there are many valid cases where you need infomask but don't have access
> to that module.
>
> The real fix seems to ensure that the module is always available.

We could make the code that displays this column do GetXmin rather
than GetRawXmin, which would allow this question to be answered, but
lose the ability to find the original xmin once the tuple is frozen
and break precedent with the xmax and cmin/cmax fields, which return
the "raw" value from the tuple header.  But I'm OK to go whichever
direction people prefer.

I think the root of the problem is that nobody's very eager to add
more hidden system catalog columns because each one bloats
pg_attribute significantly.  Currently, we have xmin, xmax, cmin, and
cmax columns, and that's basically just a historical artifact at this
point.  cmin and cmax always return the same value, and the value
returned may be neither a cmin nor a cmax but a combo CID.  And if
you're looking for information that's only in the infomask, good luck.

Maybe what we should do is add a function something like
pg_tuple_header(tableoid, ctid) that returns a record, maybe something
like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
int, hoff int).  Or perhaps some slightly more cooked version of that
information.  And then delete the xmin, xmax, cmin, and cmax system
columns.  That'd save significantly on pg_attribute entries while, at
the same time, actually providing more information than we do today.

pageinspect is useful, too, but it seems to deal mostly with pages, so
I'm not sure it's a natural substitute for something like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2013-12-20 Thread MauMau

From: "Amit Kapila" 

Few other points:
-
1.
#ifdef WIN32
/* Get event source from postgresql.conf for eventlog output */
get_config_value("event_source", event_source, sizeof(event_source));
#endif

event logging is done for both win32 and cygwin env.
under hash define (Win32 || cygwin),
so event source name should also be retrieved for both
environments. Refer below in code:

#if defined(WIN32) || defined(__CYGWIN__)
static void
write_eventlog(int level, const char *line)

2.
Docs needs to be updated for default value:
http://www.postgresql.org/docs/devel/static/event-log-registration.html
http://www.postgresql.org/docs/devel/static/runtime-config-logging.html#GUC-EVENT-SOURCE


Okay, done.  Thanks.  I'll update the commitfest entry this weekend.

Regards
MauMau



pg_ctl_eventsrc_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2013-12-20 Thread Haribabu kommi
On 20 December 2013 02:02 Bruce Momjian wrote:
> On Thu, Dec 19, 2013 at 05:14:50AM +, Haribabu kommi wrote:
> > On 19 December 2013 05:31 Bruce Momjian wrote:
> > > On Wed, Dec 11, 2013 at 10:22:32AM +, Haribabu kommi wrote:
> > > > The make_absolute_path() function moving to port is changed in
> > > similar
> > > > way as Bruce Momjian approach. The psprintf is used to store the
> > > error
> > > > string which Occurred in the function. But psprintf is not used
> > > > for storing the absolute path As because it is giving problems in
> > > > freeing
> > > the allocated memory in SelectConfigFiles.
> > > > Because the same memory is allocated in a different code branch
> > > > from
> > > guc_malloc.
> > > >
> > > > After adding the make_absolute_path() function with psprintf
> stuff
> > > > in path.c file It is giving linking problem in compilation of
> > > > ecpg. I am
> > > not able to find the problem.
> > > > So I added another file abspath.c in port which contains these
> two
> > > functions.
> > >
> > > What errors are you seeing?
> >
> > If I move the make_absolute_path function from abspath.c to path.c, I
> > was getting following linking errors while compiling "ecpg".
> >
> > ../../../../src/port/libpgport.a(path.o): In function
> `make_absolute_path':
> > /home/hari/postgres/src/port/path.c:795: undefined reference to
> `psprintf'
> > /home/hari/postgres/src/port/path.c:809: undefined reference to
> `psprintf'
> > /home/hari/postgres/src/port/path.c:818: undefined reference to
> `psprintf'
> > /home/hari/postgres/src/port/path.c:830: undefined reference to
> `psprintf'
> > collect2: ld returned 1 exit status
> > make[4]: *** [ecpg] Error 1
> > make[3]: *** [all-preproc-recurse] Error 2
> > make[2]: *** [all-ecpg-recurse] Error 2
> > make[1]: *** [all-interfaces-recurse] Error 2
> > make: *** [all-src-recurse] Error 2
> 
> You didn't show the actual command that is generating the error, but I
> assume it is linking ecpg, not creating libecpg.  I think the issue is
> that path.c is specially handled when it is included in libecpg.  Here
> is a comment from the libecpg Makefile:
> 
>   # We use some port modules verbatim, but since we need to
>   # compile with appropriate options to build a shared lib, we
> can't
>   # necessarily use the same object files as the backend uses.
> Instead,
>   # symlink the source files in here and build our own object file.
> 
> My guess is that libecpg isn't marked as linking to libpgcommon, and
> when you called psprintf in path.c, it added a libpgcommon link
> requirement.
> 
> My guess is that if you compiled common/psprintf.c like port/path.c in
> libecpg's Makefile, it would link fine.

Sorry for not mentioning the command, yes it is giving problem while linking 
ecpg.

>From the compilation I observed as libpgcommon is linking while building ecpg.
I tested the same by using psprintf directly in ecpg code.

I modified the libecpg's Makefile as suggested by you which is attached in the 
mail,
Still the errors are occurring. Please let me know is there anything missed?

Regards,
Hari babu.




make_abs_path_v3.patch
Description: make_abs_path_v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] make_interval ??

2013-12-20 Thread Pavel Stehule
Hello

we have defined interface date, time, timestamp constructors.

There is a question if we would to have some similar for interval type?

As different from time, timestamp there we can use a zero as defaults.

So constructor should to look like:

CREATE OR REPLACE FUNCTION make_interval(years int DEFAULT 0, months int
DEFAULT 0, ...)

and usage:

SELECT make_interval(years := 2)
SELECT make_interval(days := 14)

Is there a interest for this (or similar) function?

Regards

Pavel