Re: [HACKERS] Ancient bug in formatting.c/to_char()

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 04:59 AM, Peter Geoghegan wrote:

Consider this example:

[local]/postgres=# SELECT to_char(1e9::float8,'9D9');
  to_char
--
  10.0
(1 row)

[local]/postgres=# SELECT to_char(1e20::float8,'9D9');
 to_char

   1
(1 row)

[local]/postgres=# SELECT to_char(1e20,'9D9');
  to_char
--
   1.0
(1 row)


There is access to uninitialized memory here. #define
DEBUG_TO_FROM_CHAR shows this to be the case (garbage is printed to
stdout).

Valgrind brought this to my attention. I propose the attached patch as
a fix for this issue.

The direct cause appears to be that float8_to_char() feels entitled to
truncate Number description post-digits, while that doesn't happen
within numeric_to_char(), say. This is very old code, from the
original to_char() patch back in 2000, and the interactions here are
not obvious. I think that that should be okay to not do this
truncation, since the value of MAXDOUBLEWIDTH was greatly increased in
2007 as part of a round of fixes to bugs of similar vintage. There is
still a defensive measure against over-sized input (we bail), but that
ought to be unnecessary, strictly speaking.


postgres=# select to_char('0.1'::float8, '9D' || repeat('9', 1000));
ERROR:  double precision for character conversion is too wide

Yeah, the code is pretty crufty, I'm not sure what the proper fix would 
be. I wonder if psprintf would be useful.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2014-05-29 Thread Simon Riggs
On 29 May 2014 01:07, Bruce Momjian  wrote:
> On Wed, May 28, 2014 at 04:04:13PM +0100, Simon Riggs wrote:
>> On 28 May 2014 15:34, Fujii Masao  wrote:
>>
>> >> Also, compress_backup_block GUC needs to be merged with full_page_writes.
>> >
>> > Basically I agree with you because I don't want to add new GUC very 
>> > similar to
>> > the existing one.
>> >
>> > But could you imagine the case where full_page_writes = off. Even in this 
>> > case,
>> > FPW is forcibly written only during base backup. Such FPW also should be
>> > compressed? Which compression algorithm should be used? If we want to
>> > choose the algorithm for such FPW, we would not be able to merge those two
>> > GUCs. IMO it's OK to always use the best compression algorithm for such FPW
>> > and merge them, though.
>>
>> I'd prefer a new name altogether
>>
>> torn_page_protection = 'full_page_writes'
>> torn_page_protection = 'compressed_full_page_writes'
>> torn_page_protection = 'none'
>>
>> this allows us to add new techniques later like
>>
>> torn_page_protection = 'background_FPWs'
>>
>> or
>>
>> torn_page_protection = 'double_buffering'
>>
>> when/if we add those new techniques
>
> Uh, how would that work if you want to compress the background_FPWs?
> Use compressed_background_FPWs?

We've currently got 1 technique for torn page protection, soon to have
2 and with a 3rd on the horizon and likely to receive effort in next
release.

It seems sensible to have just one parameter to describe the various
techniques, as suggested. I'm suggesting that we plan for how things
will look when we have the 3rd one as well.

Alternate suggestions welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Jsonb: jbvBinary usage in the convertJsonbValue?

2014-05-29 Thread Dmitry Dolgov
Hi all,

I'm little confused by the *convertJsonbValue* functon at *jsonb_utils.c*
Maybe I misunderstood something, so I need help =)

>>> if (IsAJsonbScalar(val) || val->type == jbvBinary)
>>> convertJsonbScalar(buffer, header, val);

As I can see, the *convertJsonbScalar* function is used for scalar and
binary jsonb values. But this function doesn't handle the jbvBinary type.

>>> switch (scalarVal->type)
>>> case jbvNull:
>>> ...
>>> case jbvString:
>>> ...
>>> case jbvNumeric:
>>> .
>>> case jbvBool:
>>> .
>>> default:
>>> elog(ERROR, "invalid jsonb scalar type");

Does this mean, that binary type is incorrect for *convertJsonbValue *? Or
am I missed something?


Re: [HACKERS] Proposing pg_hibernate

2014-05-29 Thread Gurjeet Singh
On May 29, 2014 12:12 AM, "Amit Kapila"  wrote:
>
> I agree with you that there are only few corner cases where evicting
> shared buffers by this utility would harm, but was wondering if we could
> even save those, say if it would only use available free buffers.  I think
> currently there is no such interface and inventing a new interface for
this
> case doesn't seem to reasonable unless we see any other use case of
> such a interface.

+1


[HACKERS] json/jsonb inconsistence

2014-05-29 Thread Teodor Sigaev

# select  '"\uaBcD"'::json;
   json
--
 "\uaBcD"

but

# select  '"\uaBcD"'::jsonb;
ERROR:  invalid input syntax for type json
LINE 1: select  '"\uaBcD"'::jsonb;
^
DETAIL:  Unicode escape values cannot be used for code point values above 007F 
when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...

and

# select  '"\uaBcD"'::json -> 0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values above 007F 
when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...
Time: 0,696 ms

More than, json looks strange:

# select  '["\uaBcD"]'::json;
json

 ["\uaBcD"]

but

# select  '["\uaBcD"]'::json->0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values above 007F 
when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: [...

Looks like random parse rules.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Teodor Sigaev

postgres=# select  '["\u"]'::json->0;
 ?column?
--
 "\u"
(1 row)

Time: 1,294 ms
postgres=# select  '["\u"]'::jsonb->0;
 ?column?
---
 "\\u"
(1 row)

It seems to me that escape_json() is wrongly used in jsonb_put_escaped_value(), 
right name of escape_json() is a escape_to_json().


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2014-05-29 Thread Rahila Syed
>Thanks for extending and revising the FPW-compress patch! Could you add
>your patch into next CF?
Sure.  I will make improvements and add it to next CF.

>Isn't it worth measuring the recovery performance for each compression
>algorithm?
Yes I will post this soon.

On Wed, May 28, 2014 at 8:04 PM, Fujii Masao  wrote:

> On Tue, May 27, 2014 at 12:57 PM, Rahila Syed 
> wrote:
> > Hello All,
> >
> > 0001-CompressBackupBlock_snappy_lz4_pglz extends patch on compression of
> > full page writes to include LZ4 and Snappy . Changes include making
> > "compress_backup_block" GUC from boolean to enum. Value of the GUC can be
> > OFF, pglz, snappy or lz4 which can be used to turn off compression or set
> > the desired compression algorithm.
> >
> > 0002-Support_snappy_lz4 adds support for LZ4 and Snappy in PostgreSQL. It
> > uses Andres’s patch for getting Makefiles working and has a few wrappers
> to
> > make the function calls to LZ4 and Snappy compression functions and
> handle
> > varlena datatypes.
> > Patch Courtesy: Pavan Deolasee
>
> Thanks for extending and revising the FPW-compress patch! Could you add
> your patch into next CF?
>
> > Also, compress_backup_block GUC needs to be merged with full_page_writes.
>
> Basically I agree with you because I don't want to add new GUC very
> similar to
> the existing one.
>
> But could you imagine the case where full_page_writes = off. Even in this
> case,
> FPW is forcibly written only during base backup. Such FPW also should be
> compressed? Which compression algorithm should be used? If we want to
> choose the algorithm for such FPW, we would not be able to merge those two
> GUCs. IMO it's OK to always use the best compression algorithm for such FPW
> and merge them, though.
>
> > Tests use JDBC runner TPC-C benchmark to measure the amount of WAL
> > compression ,tps and response time in each of the scenarios viz .
> > Compression = OFF , pglz, LZ4 , snappy ,FPW=off
>
> Isn't it worth measuring the recovery performance for each compression
> algorithm?
>
> Regards,
>
> --
> Fujii Masao
>


Re: [HACKERS] json/jsonb inconsistence

2014-05-29 Thread Andrew Dunstan

On 05/29/2014 07:55 AM, Teodor Sigaev wrote:

# select  '"\uaBcD"'::json;
   json
--
 "\uaBcD"

but

# select  '"\uaBcD"'::jsonb;
ERROR:  invalid input syntax for type json
LINE 1: select  '"\uaBcD"'::jsonb;
^
DETAIL:  Unicode escape values cannot be used for code point values 
above 007F when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...

and

# select  '"\uaBcD"'::json -> 0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values 
above 007F when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: ...
Time: 0,696 ms

More than, json looks strange:

# select  '["\uaBcD"]'::json;
json

 ["\uaBcD"]

but

# select  '["\uaBcD"]'::json->0;
ERROR:  invalid input syntax for type json
DETAIL:  Unicode escape values cannot be used for code point values 
above 007F when the server encoding is not UTF8.

CONTEXT:  JSON data, line 1: [...

Looks like random parse rules.




It is documented that for json we don't check the validity of unicode 
escapes until we try to use them. That's because the original json 
parser didn't check them, and if we started doing so now users who 
pg_upgraded would find themselves with invalid data in the database. The 
rules for jsonb are more strict, because we actually resolve the unicode 
escapes when constructing the jsonb object. There is nothing at all 
random about it, although I agree it's slightly inconsistent. It's been 
the subject of some discussion on -hackers previously, IIRC. I actually 
referred to this difference in my talk at pgCon last Friday.


Frankly, if you want to use json/jsonb, you are going to be best served 
by using a UTF8-encoded database, or not using non-ascii unicode escapes 
in json at all.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] backup_label revisited

2014-05-29 Thread Greg Stark
So I ran into the case again where a system crashed while a hot backup
was being taken. Postgres couldn't start up automatically because the
backup_label was present. This has come up before e.g.
http://www.postgresql.org/message-id/caazkufap1gxcojtyzch13rvevjevwro1vvfbysqtwgud9is...@mail.gmail.com
but I believe no progress was made.

I was trying to think if we could somehow identify if the backup_label
was from a backup in progress or a restore in progress. Obvious
choices like putting the server ip address in it are obviously not
going to work for several reasons.

However, at least on Linux wouldn't it be sufficient to put the inode
number of the backup_label file in the backup_label? If it's still the
same inode then it's just restarting, not a restore since afaik
there's no way for tar or the like to recreate the file with the same
inode on any filesystem. That would even protect against another
restore on the same host.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Peter Eisentraut
On 5/27/14, 10:37 PM, Tom Lane wrote:
> I'm not terribly happy about pushing such a change post-beta1 either,
> but it's not like this isn't something we've known was needed.  Anyway,
> what's the worst case if we find a bug here?  Tell people not to use
> uuid-ossp?

Mainly some more discussion time would have been nice.  Also, while the
old ossp-based uuid was broken in some cases, it had a well-defined
workaround.  The new code introduces a whole new dimension of potential
portability issues.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Peter Eisentraut
On 5/27/14, 10:37 PM, Tom Lane wrote:
> If you don't like this change, we can revert it and also revert the upgrade 
> to 2.69.

Nobody else appears to be concerned, but I would have preferred this option.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 08:00 AM, Teodor Sigaev wrote:

postgres=# select  '["\u"]'::json->0;
 ?column?
--
 "\u"
(1 row)

Time: 1,294 ms
postgres=# select  '["\u"]'::jsonb->0;
 ?column?
---
 "\\u"
(1 row)

It seems to me that escape_json() is wrongly used in 
jsonb_put_escaped_value(), right name of escape_json() is a 
escape_to_json().



That's a bug. I will look into it. I think we might need to special-case 
\u on output, just as we do on input.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json/jsonb inconsistence

2014-05-29 Thread Teodor Sigaev

inconsistent. It's been the subject of some discussion on -hackers previously,
IIRC. I actually referred to this difference in my talk at pgCon last Friday.


I see, and I wasn't on your talk :( I'm playing around jsquery and now trying to 
implement UTF escapes there.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Andres Freund
On 2014-05-29 08:14:48 -0400, Peter Eisentraut wrote:
> On 5/27/14, 10:37 PM, Tom Lane wrote:
> > If you don't like this change, we can revert it and also revert the upgrade 
> > to 2.69.
> 
> Nobody else appears to be concerned, but I would have preferred this option.

I am pretty concerned actually. But I don't see downgrading to an
earlier autoconf as something really helpful. There already was a huge
portability problem with the current code. Avoiding the autoconf update
for a while wouldn't have solved it. And in 5 years time the amount of
portability problems will be much larger.
Yes, it'd have been nice if this were done a month+ ago. But nobody
stepped up :(. Seems like the least bad choice :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Peter Eisentraut
On 5/28/14, 7:02 PM, Andres Freund wrote:
> That's a good idea. What i've been thinking about is to add
> -Wno-deprecated to the bison rule in the interim. Maybe after a
> configure test for the option. All deprecation warnings so far seem to
> be pretty unhelpful.

Here is a patch.

diff --git a/config/programs.m4 b/config/programs.m4
index fd3a9a4..5570e55 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -23,6 +23,10 @@ if test "$BISON"; then
 *** Bison version 1.875 or later is required, but this is $pgac_bison_version.])
 BISON=""
   fi
+  if echo "$pgac_bison_version" | $AWK '{ if ([$]4 >= 3) exit 0; else exit 1;}'
+  then
+BISONFLAGS="$BISONFLAGS -Wno-deprecated"
+  fi
 fi
 
 if test -z "$BISON"; then
diff --git a/configure b/configure
index 3663e50..5499d56 100755
--- a/configure
+++ b/configure
@@ -7075,6 +7075,10 @@ $as_echo "$as_me: WARNING:
 *** Bison version 1.875 or later is required, but this is $pgac_bison_version." >&2;}
 BISON=""
   fi
+  if echo "$pgac_bison_version" | $AWK '{ if ($4 >= 3) exit 0; else exit 1;}'
+  then
+BISONFLAGS="$BISONFLAGS -Wno-deprecated"
+  fi
 fi
 
 if test -z "$BISON"; then

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 08:21 AM, Andres Freund wrote:

On 2014-05-29 08:14:48 -0400, Peter Eisentraut wrote:

On 5/27/14, 10:37 PM, Tom Lane wrote:

If you don't like this change, we can revert it and also revert the upgrade to 
2.69.

Nobody else appears to be concerned, but I would have preferred this option.

I am pretty concerned actually. But I don't see downgrading to an
earlier autoconf as something really helpful. There already was a huge
portability problem with the current code. Avoiding the autoconf update
for a while wouldn't have solved it. And in 5 years time the amount of
portability problems will be much larger.
Yes, it'd have been nice if this were done a month+ ago. But nobody
stepped up :(. Seems like the least bad choice :/




The most worrying thing is that we didn't find the occasioning problem 
when we switched to autoconf 2.69 back in December, so we end up dealing 
with bad choices now.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Peter Eisentraut
On 5/29/14, 8:21 AM, Andres Freund wrote:
> But I don't see downgrading to an
> earlier autoconf as something really helpful.

Well, we could have just hacked up that particular header check to do
what we want.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Andres Freund
On 2014-05-29 08:49:38 -0400, Peter Eisentraut wrote:
> On 5/29/14, 8:21 AM, Andres Freund wrote:
> > But I don't see downgrading to an
> > earlier autoconf as something really helpful.
> 
> Well, we could have just hacked up that particular header check to do
> what we want.

Still wouldn't have solved that ossp already didn't work on several
platforms at all anymore and is likely to work on even fewer in 5 years.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby
Thanks for looking at it!

> Date: Thu, 29 May 2014 00:19:33 +0300
> From: hlinnakan...@vmware.com
> To: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
> CC: klaussfre...@gmail.com
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
> and patch
> 
> On 05/28/2014 11:52 PM, John Lumby wrote:
> >
> 
> The patch seems to assume that you can put the aiocb struct in shared 
> memory, initiate an asynchronous I/O request from one process, and wait 
> for its completion from another process. I'm pretty surprised if that 
> works on any platform.

It works on linux.Actually this ability allows the asyncio implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()

This also plays very nicely with the syncscan where the cohorts run close 
together.

If anyone would like to advise whether this also works on MacOS/BSD , windows,
that would be good,  as I can't verify it myself.

> 
> How portable is POSIX aio nowadays? Googling around, it still seems that 
> on Linux, it's implemented using threads. Does the thread-emulation 
> implementation cause problems with the rest of the backend, which 
> assumes that there is only a single thread? In any case, I think we'll 

Good question,   I am not aware of any dependency on a backend having only
a single thread.Can you please point me to such dependencies?


> want to encapsulate the AIO implementation behind some kind of an API, 
> to allow other implementations to co-exist.

It is already  -it follows the smgr(stg mgr) -> md (mag disk) ->  fd 
(filesystem) 
layering used for the existing filesystem ops including posix-fadvise.

> 
> Benchmarks?

I will see if I can package mine up somehow.
Would be great if anyone else would like to benchmark it on theirs   ...


> - Heikki
  

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby
I have pasted  below the EXPLAIN of one of my benchmark queries
(the one I reference in the README).
Plenty of nested loop joins.
However I think I understand your question as to how effective it would be
if the outer is not sorted, and I will see if I can dig into that
if I get time  (and it sounds as though Claudio is on it too).

The area of exactly what the best prefetch strategy should be for
each particular type of scan and context is a good one to work on.

John

> Date: Wed, 28 May 2014 18:12:23 -0700
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
> and patch
> From: p...@heroku.com
> To: klaussfre...@gmail.com
> CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
> 
> On Wed, May 28, 2014 at 5:59 PM, Claudio Freire  
> wrote:
> > For nestloop, correct me if I'm wrong, but index scan nodes don't have
> > visibility of the next tuple to be searched for.
> 
> Nested loop joins are considered a particularly compelling case for
> prefetching, actually.
> 
> -- 
> Peter Geoghegan

-
 QUERY PLAN 

  

 Limit  (cost=801294.81..801294.81 rows=2 width=532)
   CTE deploy_zone_down
 ->  Recursive Union  (cost=1061.25..2687.40 rows=11 width=573)
   ->  Nested Loop  (cost=1061.25..1423.74 rows=1 width=41)
 ->  Nested Loop  (cost=1061.11..1421.22 rows=14 width=49)
   ->  Bitmap Heap Scan on entity zone_tree  
(cost=1060.67..1175.80 rows=29 width=40)
 Recheck Cond: ((name >= 'testZone-4375'::text) AND 
(name <= 'testZone-5499'::text) AND ((discriminator)::text = 'ZONE'::text))
 ->  BitmapAnd  (cost=1060.67..1060.67 rows=29 
width=0)
   ->  Bitmap Index Scan on entity_name  
(cost=0.00..139.71 rows=5927 width=0)
 Index Cond: ((name >= 
'testZone-4375'::text) AND (name <= 'testZone-5499'::text))
   ->  Bitmap Index Scan on 
entity_discriminatorx  (cost=0.00..920.70 rows=49636 width=0)
 Index Cond: ((discriminator)::text = 
'ZONE'::text)
   ->  Index Scan using metadata_value_owner_id on 
metadata_value mv  (cost=0.43..8.45 rows=1 width=17)
 Index Cond: (owner_id = zone_tree.id)
 ->  Index Scan using metadata_field_pkey on metadata_field mf  
(cost=0.15..0.17 rows=1 width=8)
   Index Cond: (id = mv.field_id)
   Filter: ((name)::text = 'deployable'::text)
   ->  Nested Loop  (cost=0.87..126.34 rows=1 width=573)
 ->  Nested Loop  (cost=0.72..125.44 rows=5 width=581)
   ->  Nested Loop  (cost=0.29..83.42 rows=10 width=572)
 ->  WorkTable Scan on deploy_zone_down dzd  
(cost=0.00..0.20 rows=10 width=540)
 ->  Index Scan using 
entity_discriminator_parent_zone on entity ch  (cost=0.29..8.31 rows=1 width=40)
   Index Cond: ((parent_id = dzd.zone_tree_id) 
AND ((discriminator)::text = 'ZONE'::text))
   ->  Index Scan using metadata_value_owner_id on 
metadata_value mv_1  (cost=0.43..4.19 rows=1 width=17)
 Index Cond: (owner_id = ch.id)
 ->  Index Scan using metadata_field_pkey on metadata_field 
mf_1  (cost=0.15..0.17 rows=1 width=8)
   Index Cond: (id = mv_1.field_id)
   Filter: ((name)::text = 'deployable'::text)
   CTE deploy_zone_tree
 ->  Recursive Union  (cost=0.00..9336.89 rows=21 width=1105)
   ->  CTE Scan on deploy_zone_down dzd_1  (cost=0.00..0.22 rows=11 
width=1105)
   ->  Nested Loop  (cost=0.43..933.63 rows=1 width=594)
 ->  WorkTable Scan on deploy_zone_tree dzt  (cost=0.00..2.20 
rows=110 width=581)
 ->  Index Scan using entity_pkey on entity pt  
(cost=0.43..8.46 rows=1 width=21)
   Index Cond: (id = dzt.zone_tree_ancestor_parent_id)
   Filter: ((discriminator)::text = ANY 
('{VIEW,ZONE}'::text[]))
   CTE forward_host_ip
 ->  Nested Loop  (cost=1.30..149.65 rows=24 width=88)
   ->  Nested Loop  (cost=0.87..121.69 rows=48 width=56)
 ->  Nested Loop

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Andres Freund
On 2014-05-29 08:33:05 -0400, Peter Eisentraut wrote:
> On 5/28/14, 7:02 PM, Andres Freund wrote:
> > That's a good idea. What i've been thinking about is to add
> > -Wno-deprecated to the bison rule in the interim. Maybe after a
> > configure test for the option. All deprecation warnings so far seem to
> > be pretty unhelpful.
> 
> Here is a patch.

FWIW, I vote for applying something like it. It seems better to collect
the -Wno-deprecated in one place (i.e. configure) instead of having it
in every developer's Makefile.custom. The latter will be harder to get
rid of.
I'd add a comment about why it's been added though. I won't remember why
at least...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
El 28/05/2014 22:12, "Peter Geoghegan"  escribió:
>
> On Wed, May 28, 2014 at 5:59 PM, Claudio Freire 
wrote:
> > For nestloop, correct me if I'm wrong, but index scan nodes don't have
> > visibility of the next tuple to be searched for.
>
> Nested loop joins are considered a particularly compelling case for
> prefetching, actually.

Of course. I only doubt it can be implemented without not so small changes
to all execution nodes.

I'll look into it

>
> --
> Peter Geoghegan


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Tom Lane
Andres Freund  writes:
> On 2014-05-29 08:49:38 -0400, Peter Eisentraut wrote:
>> Well, we could have just hacked up that particular header check to do
>> what we want.

> Still wouldn't have solved that ossp already didn't work on several
> platforms at all anymore and is likely to work on even fewer in 5 years.

Yeah.  The problem is not with the header check, it's with the fact that
OSSP UUID is basically broken on several modern platforms.[1]  We were
going to have to do something about that pretty soon anyway.  I agree that
this isn't the most ideal time in the dev cycle to do something about it,
but fixing portability issues is one of the expected beta-time activities,
no?  That's really what this is.

regards, tom lane

[1] Quite aside from compilation problems, yesterday's testing results
suggest that it can't read the system MAC address at all on Debian:
http://www.postgresql.org/message-id/29063.1401333...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Tom Lane
Andres Freund  writes:
> FWIW, I vote for applying something like it. It seems better to collect
> the -Wno-deprecated in one place (i.e. configure) instead of having it
> in every developer's Makefile.custom. The latter will be harder to get
> rid of.

Yeah, that's a good point.

> I'd add a comment about why it's been added though. I won't remember why
> at least...

+1

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Tom Lane
Andrew Dunstan  writes:
> On 05/29/2014 08:21 AM, Andres Freund wrote:
>> Yes, it'd have been nice if this were done a month+ ago. But nobody
>> stepped up :(. Seems like the least bad choice :/

> The most worrying thing is that we didn't find the occasioning problem 
> when we switched to autoconf 2.69 back in December, so we end up dealing 
> with bad choices now.

I think the main reason for that is that none of the buildfarm animals
were trying to build contrib/uuid-ossp on machines where OSSP UUID is
shaky.  Some of our packagers do so, though, and so once the beta got
into their hands we found out about the problem.

Short of adding "try to package according to Debian, Red Hat, etc
packaging recipes" to the buildfarm requirements, there doesn't seem to
be much that we could do to find out about such issues earlier.  (And
no, I'm not proposing that as a good idea.  As an ex-packager, I know
that those recipes are moving targets because of constantly changing
external requirements.  We don't want to take over that maintenance.)

So IMO we just have to expect that beta is going to turn up portability
issues, and we're going to have to do what it takes to resolve them,
even if it's nontrivial.

The good news on this front is that we have substantially more buildfarm
coverage of contrib/uuid-ossp than we had two days ago.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-29 Thread Teodor Sigaev

I don't think that checkAllTheSame is broken, but it probably would be
after this patch --- ISTM you're breaking the corner case described in the
header comment for checkAllTheSame, where we need to force splitting of a
set of existing tuples that the opclass can't distinguish.
INHO, this patch will not break - because it forces (in corner case) to create a 
node for new tuple but exclude it from list to insert. On next iteration we will 
find a needed node with empty list.


Comment:
 * If we know that the leaf tuples wouldn't all fit on one page, then we
 * exclude the last tuple (which is the incoming new tuple that forced a split)
 * from the check to see if more than one node is used.  The reason for this
 * is that if the existing tuples are put into only one chain, then even if
 * we move them all to an empty page, there would still not be room for the
 * new tuple, so we'd get into an infinite loop of picksplit attempts.

If we reserve a node for new tuple then on next iteration we will not fall into 
picksplit again - we will have an empty list. So new tuple will fall into 
another page.


BTW, look for following example:
create table xxx (p point);

insert into xxx select '(1,1)'::point from generate_series(1, 226) as v;
insert into xxx values ('(3,3)'::point);

create index xxxidx on xxx using spgist (p quad_point_ops);

easy to see that the single node is allTheSame although underlying leafs are 
different. Due to allTheSame search logic scans are correct but inefficient. And 
patch fixes it too.




What actually is broken, I think, is the spgist text opclass, because it's
using a zero node label for two different situations.  The scenario we're
looking at here is:


Agree for different situations, but disagree that's broken :)


> It seems like we have to
> distinguish the case of zero-length string from that of a dummy label
> for a pushed-down allTheSame tuple.

Actually, we do this: if allTheSame is true then nodes have a dummy labels.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 08:15 AM, Andrew Dunstan wrote:


On 05/29/2014 08:00 AM, Teodor Sigaev wrote:

postgres=# select '["\u"]'::json->0;
 ?column?
--
 "\u"
(1 row)

Time: 1,294 ms
postgres=# select  '["\u"]'::jsonb->0;
 ?column?
---
 "\\u"
(1 row)

It seems to me that escape_json() is wrongly used in 
jsonb_put_escaped_value(), right name of escape_json() is a 
escape_to_json().



That's a bug. I will look into it. I think we might need to 
special-case \u on output, just as we do on input.



Actually, this is just the tip of the iceberg.

Here's what 9.3 does:

   andrew=# select array_to_json(array['a','\u','b']::text[]);
array_to_json
   -
 ["a","\\u","b"]


I'm now wondering if we should pass though any unicode escape 
(presumably validated to some extent). I guess we can't change this in 
9.2/9.3 because it would be a behaviour change.


These unicode escapes have given us more trouble than any other part of 
the JSON spec :-(


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ancient bug in formatting.c/to_char()

2014-05-29 Thread Peter Geoghegan
On Thu, May 29, 2014 at 3:09 AM, Heikki Linnakangas
 wrote:
> Yeah, the code is pretty crufty, I'm not sure what the proper fix would be.
> I wonder if psprintf would be useful.


I don't know that it's worth worrying about the case you highlight. It
is certainly worth worrying about the lack of a similar fix in
float4_to_char(), though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix bogus %name-prefix option syntax in all our Bison files.

2014-05-29 Thread Martijn van Oosterhout
On Thu, May 29, 2014 at 08:33:05AM -0400, Peter Eisentraut wrote:
> On 5/28/14, 7:02 PM, Andres Freund wrote:
> > That's a good idea. What i've been thinking about is to add
> > -Wno-deprecated to the bison rule in the interim. Maybe after a
> > configure test for the option. All deprecation warnings so far seem to
> > be pretty unhelpful.
> 
> Here is a patch.
> 

Does this need a comment indicating why it's needed and when it can be
removed?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


[HACKERS] recovery testing for beta

2014-05-29 Thread Jeff Janes
What features in 9.4 need more beta testing for recovery?

I've applied my partial-write testing harness to several scenarios in 9.4.
 So far its found a recovery bug for gin indexes, a recovery bug for btree,
a vacuum bug for btree indexes (with foreign keys, but that is not relevant
to the bug), and nothing of interest for gist index, although it only
tested "where text_array @@ to_tsquery(?)" queries.

It also implicitly tested the xlog parallel write slots thing, as that is
common code to all recovery.

I also applied the foreign key test retroactively to 9.3, and it quickly
re-found the multixact bugs up until commit 9a57858f1103b89a5674.  The test
was designed only with the knowledge that the bugs involved foreign keys
and the consumption of multixacts.   I had no deeper knowledge of the
details of those bugs when designing the test, so I have a reasonable
amount of confidence that this could have found them in real time had I
bothered to try to test the feature during the previous beta cycle.

So, what else in 9.4 needs testing for recovery from crashes in general or
partial-writes in particular?

One thing is that I want to find a way to drive multixact in fast forward,
so that the freezing cycle gets a good workout. Currently I can't consume
enough of them to make them wrap around within the time frame of a test.

It looks like jsonb stuff only makes new operators for use by existing
indexes types, so probably is not a high risk for recovery bugs.  I will
probably try to test it anyway as a way to become more familiar with the
feature.  I don't really know about the logical streaming stuff.

These are the recent threads on hackers.  The first one has a link to the
harness variant which is set up for the foreign key testing.

"9.4 btree index corruption"
"9.4 checksum error in recovery with btree index"
"9.4 checksum errors in recovery with gin index"

Cheers,

Jeff


Re: [HACKERS] json/jsonb inconsistence - 2

2014-05-29 Thread Teodor Sigaev

Actually, this is just the tip of the iceberg.



Funny, but
postgres=# select '\u'::text;
  text

 \u
(1 row)

postgres=# select array['\u']::text[];
array
-
 {"\\u"}

postgres=# select '{"\u"}'::text[];
  text
-
 {u}

postgres=# select '{"\\u"}'::text[];
text
-
 {"\\u"}

&%) I'm crazy about that. Suppose, we shouldn't worry about array type, 
just make output the same for json/jsonb

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Josh Kupershmidt
On Wed, May 28, 2014 at 11:23 PM, Tom Lane  wrote:
> Buildfarm critters smew and shearwater are reporting regression test
> failures that suggest that the UUID library can't get a system MAC
> address:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=smew&dt=2014-05-28%2023%3A38%3A28
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shearwater&dt=2014-05-29%2000%3A24%3A32
>
> I've just committed regression test adjustments to prevent that from
> being a failure case, but I am confused about why it's happening.
> I wouldn't be surprised at not getting a MAC address on a machine that
> lacks any internet connection, but that surely can't describe the
> buildfarm environment.  Are you curious enough to poke into it and
> see what's going on?  It might be useful to strace a backend that's
> trying to execute uuid_generate_v1() and see what the kernel interaction
> looks like exactly.

Here's the result of attaching strace to an idle backend, then running
SELECT uuid_generate_v1(). AFAIR shearwater is a cheaply-hosted OpenVZ
VPS under the hood.

Josh
josh@ease1:~$ strace -p 6818
Process 6818 attached - interrupt to quit
recv(10, "Q\0\0\0\37SELECT uuid_generate_v1();\0", 8192, 0) = 32
gettimeofday({1401383296, 920282}, NULL) = 0
gettimeofday({1401383296, 920313}, NULL) = 0
mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xae80
stat64("/home/josh/runtime/lib/postgresql/uuid-ossp", 0xbfdf87d0) = -1 ENOENT 
(No such file or directory)
stat64("/home/josh/runtime/lib/postgresql/uuid-ossp.so", {st_mode=S_IFREG|0755, 
st_size=46685, ...}) = 0
stat64("/home/josh/runtime/lib/postgresql/uuid-ossp.so", {st_mode=S_IFREG|0755, 
st_size=46685, ...}) = 0
open("/home/josh/runtime/lib/postgresql/uuid-ossp.so", O_RDONLY) = 7
read(7, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\f\0\0004\0\0\0"..., 
512) = 512
fstat64(7, {st_mode=S_IFREG|0755, st_size=46685, ...}) = 0
mmap2(NULL, 16796, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 
0xae7fb000
mmap2(0xae7ff000, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x3) = 0xae7ff000
close(7)= 0
open("/home/josh/runtime/lib/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file 
or directory)
open("tls/i686/sse2/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("tls/i686/sse2/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("tls/i686/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("tls/i686/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/sse2/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("tls/sse2/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("tls/libuuid.so.1", O_RDONLY)  = -1 ENOENT (No such file or directory)
open("i686/sse2/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("i686/sse2/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("i686/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("i686/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("sse2/cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("sse2/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("cmov/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("libuuid.so.1", O_RDONLY)  = -1 ENOENT (No such file or directory)
open("/home/josh/runtime/lib/libuuid.so.1", O_RDONLY) = -1 ENOENT (No such file 
or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 7
fstat64(7, {st_mode=S_IFREG|0644, st_size=30628, ...}) = 0
mmap2(NULL, 30628, PROT_READ, MAP_PRIVATE, 7, 0) = 0xae7f3000
close(7)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libuuid.so.1", O_RDONLY) = 7
read(7, 
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260\22\0\0004\0\0\0"..., 512) 
= 512
fstat64(7, {st_mode=S_IFREG|0644, st_size=18000, ...}) = 0
mmap2(NULL, 20712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 
0xae7ed000
mmap2(0xae7f1000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x3) = 0xae7f1000
close(7)= 0
mprotect(0xae7f1000, 4096, PROT_READ)   = 0
munmap(0xae7f3000, 30628)   = 0
socket(PF_FILE, SOCK_STREAM, 0) = 7
connect(7, {sa_family=AF_FILE, path="/var/run/uuidd/request"}, 110) = -1 ENOENT 
(No such file or directory)
access("/usr/sbin/uuidd", X_OK) = -1 ENOENT (No such file or directory)
close(7)= 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
ioctl(7, SIOCGIFCONF, {96, {{"lo", {AF_INET, inet_addr("127.0.0.1")}}, 
{"venet0", {AF_INET, inet_addr("127.0.0.2")}}, {"venet0:0", {AF_INET, 
inet_addr("198.204.250.34")) = 0
i

Re: [HACKERS] Compression of full-page-writes

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 11:21:44AM +0100, Simon Riggs wrote:
> > Uh, how would that work if you want to compress the background_FPWs?
> > Use compressed_background_FPWs?
> 
> We've currently got 1 technique for torn page protection, soon to have
> 2 and with a 3rd on the horizon and likely to receive effort in next
> release.
> 
> It seems sensible to have just one parameter to describe the various
> techniques, as suggested. I'm suggesting that we plan for how things
> will look when we have the 3rd one as well.
> 
> Alternate suggestions welcome.

I was just pointing out that we might need compression to be a separate
boolean variable from the type of page tear protection.  I know I am
usually anti-adding-variables, but in this case it seems trying to have
one variable control several things will lead to confusion.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 02:21:57PM +0200, Andres Freund wrote:
> On 2014-05-29 08:14:48 -0400, Peter Eisentraut wrote:
> > On 5/27/14, 10:37 PM, Tom Lane wrote:
> > > If you don't like this change, we can revert it and also revert the 
> > > upgrade to 2.69.
> > 
> > Nobody else appears to be concerned, but I would have preferred this option.
> 
> I am pretty concerned actually. But I don't see downgrading to an
> earlier autoconf as something really helpful. There already was a huge
> portability problem with the current code. Avoiding the autoconf update
> for a while wouldn't have solved it. And in 5 years time the amount of
> portability problems will be much larger.
> Yes, it'd have been nice if this were done a month+ ago. But nobody
> stepped up :(. Seems like the least bad choice :/

One thing that concerns me is that we already had the problem that users
creating the uuid-ossp extension had to double-quote the name because of
the dash, and we have regularly questioned the viability of the
uuid-ossp codebase.

Now that we know we have to support alternatives, we are changing the
build API to support those alternatives, but doing nothing to decouple
the extension name from uuid-ossp and the dash issue.  

Seems this would be the logical time to just break compatibility and get
a sane API for UUID generation.  We could hack pg_dump --binary-upgrade
to map a dump of "uuid-ossp" to some other name, or hack the backend to
accept "uuid-ossp" and map that to something more sane.

I basically feel this change is making our bad API even more confusing. 
I think the only good thing about this change is that the _user_ API, as
odd as it is, remains the same.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 09:39:56AM -0700, Jeff Janes wrote:
> What features in 9.4 need more beta testing for recovery?
> 
> I've applied my partial-write testing harness to several scenarios in 9.4.  So
> far its found a recovery bug for gin indexes, a recovery bug for btree, a
> vacuum bug for btree indexes (with foreign keys, but that is not relevant to
> the bug), and nothing of interest for gist index, although it only tested
> "where text_array @@ to_tsquery(?)" queries.  
> 
> It also implicitly tested the xlog parallel write slots thing, as that is
> common code to all recovery.
> 
> I also applied the foreign key test retroactively to 9.3, and it quickly
> re-found the multixact bugs up until commit 9a57858f1103b89a5674.  The test 
> was
> designed only with the knowledge that the bugs involved foreign keys and the
> consumption of multixacts.   I had no deeper knowledge of the details of those
> bugs when designing the test, so I have a reasonable amount of confidence that
> this could have found them in real time had I bothered to try to test the
> feature during the previous beta cycle.

Wow, that is impressive!  We are looking for a ways to find bugs like
the ones the appeared in 9.3.X, and it seems you might have found a way.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Tom Lane
Bruce Momjian  writes:
> One thing that concerns me is that we already had the problem that users
> creating the uuid-ossp extension had to double-quote the name because of
> the dash, and we have regularly questioned the viability of the
> uuid-ossp codebase.

> Now that we know we have to support alternatives, we are changing the
> build API to support those alternatives, but doing nothing to decouple
> the extension name from uuid-ossp and the dash issue.  

Well, if you've got a proposal for how to rename the extension without
creating major compatibility problems, let's hear it.

> Seems this would be the logical time to just break compatibility and get
> a sane API for UUID generation.

Most people think the "sane API" would be to put the functionality in
core, and forget about any extension name at all.  The compatibility
problems with that approach aren't exactly trivial either, but I suspect
that's where we'll end up in the long run.  So I'm not that excited about
kluge solutions for renaming the extension.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Tom Lane
Josh Kupershmidt  writes:
> On Wed, May 28, 2014 at 11:23 PM, Tom Lane  wrote:
>> I've just committed regression test adjustments to prevent that from
>> being a failure case, but I am confused about why it's happening.
>> I wouldn't be surprised at not getting a MAC address on a machine that
>> lacks any internet connection, but that surely can't describe the
>> buildfarm environment.  Are you curious enough to poke into it and
>> see what's going on?  It might be useful to strace a backend that's
>> trying to execute uuid_generate_v1() and see what the kernel interaction
>> looks like exactly.

> Here's the result of attaching strace to an idle backend, then running
> SELECT uuid_generate_v1(). AFAIR shearwater is a cheaply-hosted OpenVZ
> VPS under the hood.

Interesting.  Looks like you have access only to virtual network
interfaces, and they report all-zero MAC addresses, which the UUID library
is smart enough to ignore.  If smew is also in a virtual environment
then that's probably the explanation.  (There are some other buildfarm
critters that are reporting MAC addresses with the local-admin bit set,
which I suspect also means they've got virtual network interfaces, but
with a different treatment of the what-to-report problem.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-05-29 Thread Bruce Momjian
On Thu, May 29, 2014 at 01:56:22PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > One thing that concerns me is that we already had the problem that users
> > creating the uuid-ossp extension had to double-quote the name because of
> > the dash, and we have regularly questioned the viability of the
> > uuid-ossp codebase.
> 
> > Now that we know we have to support alternatives, we are changing the
> > build API to support those alternatives, but doing nothing to decouple
> > the extension name from uuid-ossp and the dash issue.  
> 
> Well, if you've got a proposal for how to rename the extension without
> creating major compatibility problems, let's hear it.

Well, the only two I could think of were the pg_dump and backend mapping
changes;  pretty hacky.

> > Seems this would be the logical time to just break compatibility and get
> > a sane API for UUID generation.
> 
> Most people think the "sane API" would be to put the functionality in
> core, and forget about any extension name at all.  The compatibility
> problems with that approach aren't exactly trivial either, but I suspect
> that's where we'll end up in the long run.  So I'm not that excited about
> kluge solutions for renaming the extension.

OK.  I was just worried that users are now using a badly-named uuid-ossp
extension that isn't even using uuid-ossp in many cases.  If we have a
long-term plan to fix this, then you are right that it isn't worth
worrying about it now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Alvaro Herrera
Jeff Janes wrote:

> One thing is that I want to find a way to drive multixact in fast forward,
> so that the freezing cycle gets a good workout. Currently I can't consume
> enough of them to make them wrap around within the time frame of a test.

IIRC I lobotomized it up by removing the XLogInsert() call.  That allows
you to generate large amounts of multixacts quickly.  In my laptop this
was able to do four or five wraparound cycles in two-three hours or so,
using the "burn multixact" utility here:
http://www.postgresql.org/message-id/20131231035913.gu22...@eldon.alvh.no-ip.org

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts

2014-05-29 Thread Robert Haas
On Sun, May 25, 2014 at 11:34 PM, Michael Paquier
 wrote:
> I noticed a couple of typos in Solution.pm, fixed by the patch
> attached. Those things have been introduced by commits cec8394 (visual
> 2013 support) and 0a78320 (latest pgident run, not actually a problem
> of pgident, because of a misuse of commas).
> The tab problems should not be backpatched to 9.3, but the use of
> commas instead of semicolons should be.

I suspect it would be good, then, to separate this into two patches,
one for back-patch and one not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Josh Berkus
On 05/29/2014 09:39 AM, Jeff Janes wrote:
> 
> I've applied my partial-write testing harness to several scenarios in
> 9.4.  So far its found a recovery bug for gin indexes, a recovery bug
> for btree, a vacuum bug for btree indexes (with foreign keys, but that
> is not relevant to the bug), and nothing of interest for gist index,
> although it only tested "where text_array @@ to_tsquery(?)" queries.

This is awesome.  Thanks for doing it!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST bug.

2014-05-29 Thread Tom Lane
Teodor Sigaev  writes:
>> I don't think that checkAllTheSame is broken, but it probably would be
>> after this patch --- ISTM you're breaking the corner case described in the
>> header comment for checkAllTheSame, where we need to force splitting of a
>> set of existing tuples that the opclass can't distinguish.

> INHO, this patch will not break - because it forces (in corner case) to 
> create a 
> node for new tuple but exclude it from list to insert. On next iteration we 
> will 
> find a needed node with empty list.

> Comment:
>   * If we know that the leaf tuples wouldn't all fit on one page, then we
>   * exclude the last tuple (which is the incoming new tuple that forced a 
> split)
>   * from the check to see if more than one node is used.  The reason for this
>   * is that if the existing tuples are put into only one chain, then even if
>   * we move them all to an empty page, there would still not be room for the
>   * new tuple, so we'd get into an infinite loop of picksplit attempts.

> If we reserve a node for new tuple then on next iteration we will not fall 
> into 
> picksplit again - we will have an empty list. So new tuple will fall into 
> another page.

The point I'm making is that the scenario your test case exposes is not
an infinite loop of picksplits, but an infinite loop of choose calls.
There are certainly ways to get into that loop that don't involve this
specific checkAllTheSame logic.  Here's an example:

regression=# create table xxx ( t text);
CREATE TABLE
regression=# create index xxxidx on xxx using spgist (t);
CREATE INDEX
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',3996);

In this example, we get a picksplit at the third insert, and here we
*must* take the allTheSame path because the values are, in fact, all the
same (the SPGIST_MAX_PREFIX_LENGTH constraint means we can only put 3996
characters into the prefix, and then the 3997'th character of each value
is 'x').  Then the fourth insert triggers this same infinite loop, because
spg_text_choose is asked how to insert the slightly-shorter value into the
allTheSame tuple, and it does the wrong thing.

It's certainly possible that we could/should change what checkAllTheSame
is doing --- on re-reading the comment, I'm not really sure that the
scenario it's concerned about can happen.  However, that will not fix
the bug in spgtextproc.c; it will only block one avenue for reaching
the bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Tom Lane
Alvaro Herrera  writes:
> Jeff Janes wrote:
>> One thing is that I want to find a way to drive multixact in fast forward,
>> so that the freezing cycle gets a good workout. Currently I can't consume
>> enough of them to make them wrap around within the time frame of a test.

> IIRC I lobotomized it up by removing the XLogInsert() call.  That allows
> you to generate large amounts of multixacts quickly.  In my laptop this
> was able to do four or five wraparound cycles in two-three hours or so,
> using the "burn multixact" utility here:
> http://www.postgresql.org/message-id/20131231035913.gu22...@eldon.alvh.no-ip.org

Another possibility is to use pg_resetxlog to manually advance the
multixact counter to a point near wraparound.  I think you have to
manually create appropriate slru segment files as well when doing that
(someday we should hack pg_resetxlog to do that for you).  Still, it
might beat waiting hours to burn multixacts one by one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 02:38 PM, Tom Lane wrote:

Josh Kupershmidt  writes:

On Wed, May 28, 2014 at 11:23 PM, Tom Lane  wrote:

I've just committed regression test adjustments to prevent that from
being a failure case, but I am confused about why it's happening.
I wouldn't be surprised at not getting a MAC address on a machine that
lacks any internet connection, but that surely can't describe the
buildfarm environment.  Are you curious enough to poke into it and
see what's going on?  It might be useful to strace a backend that's
trying to execute uuid_generate_v1() and see what the kernel interaction
looks like exactly.

Here's the result of attaching strace to an idle backend, then running
SELECT uuid_generate_v1(). AFAIR shearwater is a cheaply-hosted OpenVZ
VPS under the hood.

Interesting.  Looks like you have access only to virtual network
interfaces, and they report all-zero MAC addresses, which the UUID library
is smart enough to ignore.  If smew is also in a virtual environment
then that's probably the explanation.  (There are some other buildfarm
critters that are reporting MAC addresses with the local-admin bit set,
which I suspect also means they've got virtual network interfaces, but
with a different treatment of the what-to-report problem.)






Almost all my critters run in VMs (all but jacana and bowerbird).

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-05-29 Thread Robert Haas
On Sun, May 25, 2014 at 6:12 AM, Anastasia Lubennikova
 wrote:
> Hi, hackers!
> There are first results of my work on GSoC project "Index-only scans for
> GIST".

Cool.

> 1. Version of my project code is in forked repository
> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> Patch is in attachments
> - This version is only for one-column indexes

That's probably a limitation that needs to be fixed before this can be
committed.

> - fetch() method is realized only for box opclass (because it's trivial)

That might not need to be fixed before this can be committed.

> 2. I test Index-only scans with SQL script box_test.sql
> and it works really faster. (results in box_test_out)
>
> I'll be glad to get your feedback about this feature.

Since this is a GSoC project, it would be nice if one of the people
who is knowledgeable about GIST (Heikki, Alexander, etc.) could weigh
in on this before too much time goes by, so that Anastasia can press
forward with this work.

I don't know enough to offer too many substantive comments, but I
think you should remove all of the //-style comments (most of which
are debugging leftovers) and add some more comments describing what
you're actually doing and, more importantly, why.

This comment doesn't appear to make sense:

+   /*
+* The offset number on tuples on internal pages is unused.
For historical
+* reasons, it is set 0x.
+*/

The reason this doesn't make sense is because the tuple in question is
not on an internal page, or indeed any page at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 04:12 PM, John Lumby wrote:

Thanks for looking at it!


Date: Thu, 29 May 2014 00:19:33 +0300
From: hlinnakan...@vmware.com
To: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
CC: klaussfre...@gmail.com
Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
and patch

On 05/28/2014 11:52 PM, John Lumby wrote:




The patch seems to assume that you can put the aiocb struct in shared
memory, initiate an asynchronous I/O request from one process, and wait
for its completion from another process. I'm pretty surprised if that
works on any platform.


It works on linux.Actually this ability allows the asyncio implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()


[checks]. No, it doesn't work. See attached test program.

It kinda seems to work sometimes, because of the way it's implemented in 
glibc. The aiocb struct has a field for the result value and errno, and 
when the I/O is finished, the worker thread fills them in. aio_error() 
and aio_return() just return the values of those fields, so calling 
aio_error() or aio_return() do in fact happen to work from a different 
process. aio_suspend(), however, is implemented by sleeping on a 
process-local mutex, which does not work from a different process.


Even if it worked on Linux today, it would be a bad idea to rely on it 
from a portability point of view. No, the only sane way to make this 
work is that the process that initiates an I/O request is responsible 
for completing it. If another process needs to wait for an async I/O to 
complete, we must use some other means to do the waiting. Like the 
io_in_progress_lock that we already have, for the same purpose.


- Heikki

/*
 * Test program to test if POSIX aio functions work across processes
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


char *shmem;

void
processA(void)
{
	int fd;
	struct aiocb *aiocbp = (struct aiocb *) shmem;
	char *buf = shmem + sizeof(struct aiocb);

	fd = open("aio-shmem-test-file", O_CREAT | O_WRONLY | O_SYNC, S_IRWXU);
	if (fd == -1)
	{
		fprintf(stderr, "open() failed\n");
		exit(1);
	}
	printf("processA starting AIO\n");

	strcpy(buf, "foobar");

	memset(aiocbp, 0, sizeof(struct aiocb));
	aiocbp->aio_fildes = fd;
	aiocbp->aio_offset = 0;
	aiocbp->aio_buf = buf;
	aiocbp->aio_nbytes = strlen(buf);
	aiocbp->aio_reqprio = 0;
	aiocbp->aio_sigevent.sigev_notify = SIGEV_NONE;

	if (aio_write(aiocbp) != 0)
	{
		fprintf(stderr, "aio_write() failed\n");
		exit(1);
	}
}

void
processB(void)
{
	struct aiocb *aiocbp = (struct aiocb *) shmem;
	const struct aiocb * const pl[1] = { aiocbp };
	int rv;


	printf("waiting for the write to finish in process B\n");

	if (aio_suspend(pl, 1, NULL) != 0)
	{
		fprintf(stderr, "aio_suspend() failed\n");
		exit(1);
	}
	printf("aio_suspend() returned 0\n");

	rv = aio_error(aiocbp);
	if (rv != 0)
	{
		fprintf(stderr, "aio_error returned %d: %s\n", rv, strerror(rv));
		exit(1);
	}
	rv = aio_return(aiocbp);
	printf("aio_return returned %d\n", rv);
}



int main(int argc, char **argv)
{
	int pidB;

	shmem = mmap(NULL, sizeof(struct aiocb) + 1000,
 PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS,
 -1, 0);
	if (shmem == MAP_FAILED)
	{
		fprintf(stderr, "mmap() failed\n");
		exit(1);
	}

#ifdef SINGLE_PROCESS
	/* this works */
	processA();
	processB();
#else
	/*
	 * Start the I/O request in parent process, then fork and try to wait
	 * for it to finish from the child process. (doesn't work, it will hang
	 * forever)
	 */
	processA();

	pidB = fork();
	if (pidB == -1)
	{
		fprintf(stderr, "fork() failed\n");
		exit(1);
	}
	if (pidB != 0)
	{
		/* parent */
		wait (pidB);
	}
	else
	{
		/* child */
		processB();
	}
#endif
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
One of the annoying things about the SPGiST bug Teodor pointed out is
that once you're in the infinite loop, query cancel doesn't work to get
you out of it.  There are a couple of CHECK_FOR_INTERRUPTS() calls in
spgdoinsert() that are meant to get you out of exactly this sort of
buggy-opclass situation --- but they don't help.  The reason is that
when we reach that point we're holding one or more buffer locks, which
means the HOLD_INTERRUPTS count is positive, so ProcessInterrupts is
afraid to do anything.

In point of fact, we'd be happy to give up the locks and then throw
the error.  So I was thinking about inventing some macro or out-of-line
function that went about like this:

if (InterruptPending && (QueryCancelPending || ProcDiePending))
{
LWLockReleaseAll();
ProcessInterrupts();
elog(ERROR, "ProcessInterrupts failed to throw an error");
}

where we don't expect to reach the final elog; it's just there to make
sure we don't return control after yanking locks out from under the
caller.

You might object that this is risky since it might release locks other
than the buffer locks spgdoinsert() is specifically concerned with.
However, if spgdoinsert() were to throw an error for a reason other than
query cancel, any such locks would get released anyway as the first step
during error cleanup, so I think this is not a real concern.

There may be other places in the index AMs or other low-level code where
this'd be appropriate; I've not really looked.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Logical slots not mentioned in CREATE_REPLICATION_SLOT for replication protocol docs

2014-05-29 Thread Robert Haas
On Thu, May 22, 2014 at 12:18 AM, Michael Paquier
 wrote:
> On Thu, May 22, 2014 at 12:44 PM, Michael Paquier
>  wrote:
>> Hi all,
>>
>> As written in subject, replication protocol documentation lacks
>> details about logical slots in CREATE_REPLICATION_SLOT command:
>> http://www.postgresql.org/docs/devel/static/protocol-replication.html
>> Attached is a patch correcting that.
> An additional thing I noticed: START_REPLICATION does not mention that
> it is possible to specify options for the output plugin. All the fixes
> are included in the patch attached.

Thanks, this looks good.  But shouldn't the bit about output plugin
options mention say something like:

( option_name option_argument [, ...] )

...instead of just:

( option [, ...] )

?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 11:25 PM, Tom Lane wrote:

One of the annoying things about the SPGiST bug Teodor pointed out is
that once you're in the infinite loop, query cancel doesn't work to get
you out of it.  There are a couple of CHECK_FOR_INTERRUPTS() calls in
spgdoinsert() that are meant to get you out of exactly this sort of
buggy-opclass situation --- but they don't help.  The reason is that
when we reach that point we're holding one or more buffer locks, which
means the HOLD_INTERRUPTS count is positive, so ProcessInterrupts is
afraid to do anything.

In point of fact, we'd be happy to give up the locks and then throw
the error.  So I was thinking about inventing some macro or out-of-line
function that went about like this:

if (InterruptPending && (QueryCancelPending || ProcDiePending))
{
LWLockReleaseAll();
ProcessInterrupts();
elog(ERROR, "ProcessInterrupts failed to throw an error");
}

where we don't expect to reach the final elog; it's just there to make
sure we don't return control after yanking locks out from under the
caller.


Also checking that CritSectionCount == 0 seems like a good idea...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
 wrote:
> On 05/29/2014 04:12 PM, John Lumby wrote:
>>
>> Thanks for looking at it!
>>
>>> Date: Thu, 29 May 2014 00:19:33 +0300
>>> From: hlinnakan...@vmware.com
>>> To: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
>>> CC: klaussfre...@gmail.com
>>> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO -
>>> proposal and patch
>>>
>>> On 05/28/2014 11:52 PM, John Lumby wrote:


>>>
>>> The patch seems to assume that you can put the aiocb struct in shared
>>> memory, initiate an asynchronous I/O request from one process, and wait
>>> for its completion from another process. I'm pretty surprised if that
>>> works on any platform.
>>
>>
>> It works on linux.Actually this ability allows the asyncio
>> implementation to
>> reduce complexity in one respect (yes I know it looks complex enough) :
>> it makes waiting for completion of an in-progress IO simpler than for
>> the existing synchronous IO case,.   since librt takes care of the
>> waiting.
>> specifically,   no need for extra wait-for-io control blocks
>> such as in bufmgr's  WaitIO()
>
>
> [checks]. No, it doesn't work. See attached test program.
>
> It kinda seems to work sometimes, because of the way it's implemented in
> glibc. The aiocb struct has a field for the result value and errno, and when
> the I/O is finished, the worker thread fills them in. aio_error() and
> aio_return() just return the values of those fields, so calling aio_error()
> or aio_return() do in fact happen to work from a different process.
> aio_suspend(), however, is implemented by sleeping on a process-local mutex,
> which does not work from a different process.
>
> Even if it worked on Linux today, it would be a bad idea to rely on it from
> a portability point of view. No, the only sane way to make this work is that
> the process that initiates an I/O request is responsible for completing it.
> If another process needs to wait for an async I/O to complete, we must use
> some other means to do the waiting. Like the io_in_progress_lock that we
> already have, for the same purpose.

But calls to it are timeouted by 10us, effectively turning the thing
into polling mode.

Which is odd now that I read carefully, is how come 256 waits of 10us
amounts to anything? That's just 2.5ms, not enough IIUC for any normal
I/O to complete.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Peter Geoghegan
On Thu, May 29, 2014 at 1:34 PM, Heikki Linnakangas
 wrote:
> Also checking that CritSectionCount == 0 seems like a good idea...


What do you do if it isn't?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Heikki Linnakangas

On 05/29/2014 11:34 PM, Claudio Freire wrote:

On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
 wrote:

On 05/29/2014 04:12 PM, John Lumby wrote:



On 05/28/2014 11:52 PM, John Lumby wrote:

The patch seems to assume that you can put the aiocb struct in shared
memory, initiate an asynchronous I/O request from one process, and wait
for its completion from another process. I'm pretty surprised if that
works on any platform.


It works on linux.Actually this ability allows the asyncio
implementation to
reduce complexity in one respect (yes I know it looks complex enough) :
it makes waiting for completion of an in-progress IO simpler than for
the existing synchronous IO case,.   since librt takes care of the
waiting.
specifically,   no need for extra wait-for-io control blocks
such as in bufmgr's  WaitIO()


[checks]. No, it doesn't work. See attached test program.

It kinda seems to work sometimes, because of the way it's implemented in
glibc. The aiocb struct has a field for the result value and errno, and when
the I/O is finished, the worker thread fills them in. aio_error() and
aio_return() just return the values of those fields, so calling aio_error()
or aio_return() do in fact happen to work from a different process.
aio_suspend(), however, is implemented by sleeping on a process-local mutex,
which does not work from a different process.

Even if it worked on Linux today, it would be a bad idea to rely on it from
a portability point of view. No, the only sane way to make this work is that
the process that initiates an I/O request is responsible for completing it.
If another process needs to wait for an async I/O to complete, we must use
some other means to do the waiting. Like the io_in_progress_lock that we
already have, for the same purpose.


But calls to it are timeouted by 10us, effectively turning the thing
into polling mode.


We don't want polling... And even if we did, calling aio_suspend() in a 
way that's known to be broken, in a loop, is a pretty crappy way of polling.



Which is odd now that I read carefully, is how come 256 waits of 10us
amounts to anything? That's just 2.5ms, not enough IIUC for any normal
I/O to complete


Wild guess: the buffer you're reading is already in OS cache.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
Heikki Linnakangas  writes:
> On 05/29/2014 11:25 PM, Tom Lane wrote:
>> In point of fact, we'd be happy to give up the locks and then throw
>> the error.  So I was thinking about inventing some macro or out-of-line
>> function that went about like this:
>> 
>> if (InterruptPending && (QueryCancelPending || ProcDiePending))
>> {
>>  LWLockReleaseAll();
>>  ProcessInterrupts();
>>  elog(ERROR, "ProcessInterrupts failed to throw an error");
>> }

> Also checking that CritSectionCount == 0 seems like a good idea...

Yeah, and there may be a couple other details like that.  Right now
I'm just thinking about not allowing LWLocks to block the cancel.
I guess something else to consider is whether InterruptHoldoffCount
could be larger than the number of held LWLocks; if so, that would
prevent this from working as desired.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, May 29, 2014 at 1:34 PM, Heikki Linnakangas
>  wrote:
>> Also checking that CritSectionCount == 0 seems like a good idea...

> What do you do if it isn't?

Not throw the error.  (If we did, it'd turn into a PANIC, which doesn't
seem like what we want.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
 wrote:
> On 05/29/2014 11:34 PM, Claudio Freire wrote:
>>
>> On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
>>  wrote:
>>>
>>> On 05/29/2014 04:12 PM, John Lumby wrote:


> On 05/28/2014 11:52 PM, John Lumby wrote:
>
> The patch seems to assume that you can put the aiocb struct in shared
> memory, initiate an asynchronous I/O request from one process, and wait
> for its completion from another process. I'm pretty surprised if that
> works on any platform.


 It works on linux.Actually this ability allows the asyncio
 implementation to
 reduce complexity in one respect (yes I know it looks complex enough) :
 it makes waiting for completion of an in-progress IO simpler than for
 the existing synchronous IO case,.   since librt takes care of the
 waiting.
 specifically,   no need for extra wait-for-io control blocks
 such as in bufmgr's  WaitIO()
>>>
>>>
>>> [checks]. No, it doesn't work. See attached test program.
>>>
>>> It kinda seems to work sometimes, because of the way it's implemented in
>>> glibc. The aiocb struct has a field for the result value and errno, and
>>> when
>>> the I/O is finished, the worker thread fills them in. aio_error() and
>>> aio_return() just return the values of those fields, so calling
>>> aio_error()
>>> or aio_return() do in fact happen to work from a different process.
>>> aio_suspend(), however, is implemented by sleeping on a process-local
>>> mutex,
>>> which does not work from a different process.
>>>
>>> Even if it worked on Linux today, it would be a bad idea to rely on it
>>> from
>>> a portability point of view. No, the only sane way to make this work is
>>> that
>>> the process that initiates an I/O request is responsible for completing
>>> it.
>>> If another process needs to wait for an async I/O to complete, we must
>>> use
>>> some other means to do the waiting. Like the io_in_progress_lock that we
>>> already have, for the same purpose.
>>
>>
>> But calls to it are timeouted by 10us, effectively turning the thing
>> into polling mode.
>
>
> We don't want polling... And even if we did, calling aio_suspend() in a way
> that's known to be broken, in a loop, is a pretty crappy way of polling.

Agreed. Just saying that that's why it works.

The PID of the process responsible for the aio should be there
somewhere, and other processes should explicitly synchronize (or
initiate their own I/O and let the OS merge them - which also works).

>
>
>> Which is odd now that I read carefully, is how come 256 waits of 10us
>> amounts to anything? That's just 2.5ms, not enough IIUC for any normal
>> I/O to complete
>
>
> Wild guess: the buffer you're reading is already in OS cache.

My wild guess as well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Peter Geoghegan
On Thu, May 29, 2014 at 1:42 PM, Tom Lane  wrote:
> Not throw the error.  (If we did, it'd turn into a PANIC, which doesn't
> seem like what we want.)

Of course, but it isn't obvious to me that that isn't what we'd want,
if the alternative is definitely that we'd be stuck in an infinite
uninterruptible loop (I guess we couldn't just avoid throwing the
error, thereby risking proceeding without locks -- and so I guess we
can basically do nothing like this in critical sections). Now, that
probably isn't the actual choice that we must face, but it also isn't
obvious to me how you can do better than not throwing the error (and
doing nothing extra) when in a critical section. That was the intent
of my original question.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful

2014-05-29 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, May 29, 2014 at 1:42 PM, Tom Lane  wrote:
>> Not throw the error.  (If we did, it'd turn into a PANIC, which doesn't
>> seem like what we want.)

> Of course, but it isn't obvious to me that that isn't what we'd want,
> if the alternative is definitely that we'd be stuck in an infinite
> uninterruptible loop (I guess we couldn't just avoid throwing the
> error, thereby risking proceeding without locks -- and so I guess we
> can basically do nothing like this in critical sections). Now, that
> probably isn't the actual choice that we must face, but it also isn't
> obvious to me how you can do better than not throwing the error (and
> doing nothing extra) when in a critical section. That was the intent
> of my original question.

I don't feel a need for special behavior for this inside critical
sections; a critical section that lasts long enough for query cancel
response to be problematic is probably already broken by design.

The other case of InterruptHoldoffCount being too high might be more
relevant.  I have not gone around and looked at retail uses of
HOLD_INTERRUPTS to see whether there are any that might be holding counts
for long periods.  But I have checked that for the case in spgdoinsert,
the only thing staying ProcessInterrupts' hand is the holdoff count
associated with the buffer lock on the current index page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Tom Lane
Claudio Freire  writes:
> Didn't fix that, but the attached patch does fix regression tests when
> scanning over index types other than btree (was invoking elog when the
> index am didn't have ampeeknexttuple)

"ampeeknexttuple"?  That's a bit scary.  It would certainly be unsafe
for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
nbtree/README).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Josh Kupershmidt
On Thu, May 29, 2014 at 4:06 PM, Andrew Dunstan  wrote:
> On 05/29/2014 02:38 PM, Tom Lane wrote:
>> Josh Kupershmidt  writes:
>> Interesting.  Looks like you have access only to virtual network
>> interfaces, and they report all-zero MAC addresses, which the UUID library
>> is smart enough to ignore.  If smew is also in a virtual environment
>> then that's probably the explanation.  (There are some other buildfarm
>> critters that are reporting MAC addresses with the local-admin bit set,
>> which I suspect also means they've got virtual network interfaces, but
>> with a different treatment of the what-to-report problem.)

> Almost all my critters run in VMs (all but jacana and bowerbird).

They're not running on OpenVZ, are they? `ifconfig` on shearwater says:

venet0Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
  inet addr:127.0.0.2  P-t-P:127.0.0.2  Bcast:0.0.0.0
Mask:255.255.255.255
  UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
  RX packets:1409294 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1488401 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:751149524 (716.3 MiB)  TX bytes:740573200 (706.2 MiB)

venet0:0  Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
  inet addr:198.204.250.34  P-t-P:198.204.250.34
Bcast:0.0.0.0  Mask:255.255.255.255
  UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1

and it seems this all-zeros MAC address is a common
(mis-?)configuration on OpenVZ:

https://github.com/robertdavidgraham/masscan/issues/43
http://stackoverflow.com/questions/5838225/how-do-i-get-a-default-gridgain-node-in-openvz-discover-other-nodes-on-the-same
http://forum.openvz.org/index.php?t=msg&goto=8117


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:19 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> Didn't fix that, but the attached patch does fix regression tests when
>> scanning over index types other than btree (was invoking elog when the
>> index am didn't have ampeeknexttuple)
>
> "ampeeknexttuple"?  That's a bit scary.  It would certainly be unsafe
> for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
> nbtree/README).


It's not really the tuple, just the tid


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:43 PM, Claudio Freire  wrote:
> On Thu, May 29, 2014 at 6:19 PM, Tom Lane  wrote:
>> Claudio Freire  writes:
>>> Didn't fix that, but the attached patch does fix regression tests when
>>> scanning over index types other than btree (was invoking elog when the
>>> index am didn't have ampeeknexttuple)
>>
>> "ampeeknexttuple"?  That's a bit scary.  It would certainly be unsafe
>> for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
>> nbtree/README).
>
>
> It's not really the tuple, just the tid

And, furthermore, it's used only to do prefetching, so even if the tid
was invalid when the tuple needs to be accessed, it wouldn't matter,
because the indexam wouldn't use the result of ampeeknexttuple to do
anything at that time.

Though, your comment does illustrate the need to document that on
ampeeknexttuple, for future users.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby


> Date: Thu, 29 May 2014 18:00:28 -0300
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
> and patch
> From: klaussfre...@gmail.com
> To: hlinnakan...@vmware.com
> CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
> 
> On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
>  wrote:
> > On 05/29/2014 11:34 PM, Claudio Freire wrote:
> >>
> >> On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
> >>  wrote:
> >>>
> >>> On 05/29/2014 04:12 PM, John Lumby wrote:
> 
> 
> > On 05/28/2014 11:52 PM, John Lumby wrote:
> >
> > The patch seems to assume that you can put the aiocb struct in shared
> > memory, initiate an asynchronous I/O request from one process, and wait
> > for its completion from another process. I'm pretty surprised if that
> > works on any platform.
> 
> 
>  It works on linux.Actually this ability allows the asyncio
>  implementation to
>  reduce complexity in one respect (yes I know it looks complex enough) :
>  it makes waiting for completion of an in-progress IO simpler than for
>  the existing synchronous IO case,.   since librt takes care of the
>  waiting.
>  specifically,   no need for extra wait-for-io control blocks
>  such as in bufmgr's  WaitIO()
> >>>
> >>>
> >>> [checks]. No, it doesn't work. See attached test program.

Thanks for checkingand thanks for coming up with that test program.
However,  yes,  it really does work  --  always  (on linux).
Your test program is doing things in the wrong order -
it calls aio_suspend *before* aio_error.
However,  the rule is,  call aio_suspend *after* aio_error
and *only* if aio_error returns EINPROGRESS.

See the code changes to fd.c function FileCompleteaio()
to see how we have done it.   And I am attaching corrected version
of your test program which runs just fine.


> >>>
> >>> It kinda seems to work sometimes, because of the way it's implemented in
> >>> glibc. The aiocb struct has a field for the result value and errno, and
> >>> when
> >>> the I/O is finished, the worker thread fills them in. aio_error() and
> >>> aio_return() just return the values of those fields, so calling
> >>> aio_error()
> >>> or aio_return() do in fact happen to work from a different process.
> >>> aio_suspend(), however, is implemented by sleeping on a process-local
> >>> mutex,
> >>> which does not work from a different process.
> >>>
> >>> Even if it worked on Linux today, it would be a bad idea to rely on it
> >>> from
> >>> a portability point of view. No, the only sane way to make this work is
> >>> that
> >>> the process that initiates an I/O request is responsible for completing
> >>> it.
> >>> If another process needs to wait for an async I/O to complete, we must
> >>> use
> >>> some other means to do the waiting. Like the io_in_progress_lock that we
> >>> already have, for the same purpose.
> >>
> >>
> >> But calls to it are timeouted by 10us, effectively turning the thing
> >> into polling mode.
> >
> >
> > We don't want polling... And even if we did, calling aio_suspend() in a way
> > that's known to be broken, in a loop, is a pretty crappy way of polling.

Well,  as mentioned earlier,  it is not broken. Whether it is efficient I 
am not sure.
I have looked at the mutex in aio_suspend that you mentioned and I am not
quite convinced that,  if caller is not the original aio_read process,
it renders the suspend() into an instant timeout.  I will see if I can 
verify that.
Where are you (Claudio) seeing 10us?

> 
> 
> Didn't fix that, but the attached patch does fix regression tests when
> scanning over index types other than btree (was invoking elog when the
> index am didn't have ampeeknexttuple)
  /*
 * Test program to test if POSIX aio functions work across processes
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

char *shmem;

void
processA(void)
{
int fd;
struct aiocb *aiocbp = (struct aiocb *) shmem;
char *buf = shmem + sizeof(struct aiocb);

fd = open("aio-shmem-test-file", O_CREAT | O_WRONLY | O_SYNC, S_IRWXU);
if (fd == -1)
{
fprintf(stderr, "open() failed\n");
exit(1);
}
printf("processA starting AIO\n");

strcpy(buf, "foobar");

memset(aiocbp, 0, sizeof(struct aiocb));
aiocbp->aio_fildes = fd;
aiocbp->aio_offset = 0;
aiocbp->aio_buf = buf;
aiocbp->aio_nbytes = strlen(buf);
aiocbp->aio_reqprio = 0;
aiocbp->aio_sigevent.sigev_notify = SIGEV_NONE;

if (aio_write(aiocbp) != 0)
{
fprintf(stderr, "aio_write() failed\n");
exit(1);
}
}

void
processB(void)
{
struct aiocb *aiocbp = (struct aiocb *) shmem;
const struct aiocb * const pl[1] = { aiocbp };
int rv;
int returnCode;
struc

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Tom Lane
Claudio Freire  writes:
> On Thu, May 29, 2014 at 6:43 PM, Claudio Freire  
> wrote:
>> On Thu, May 29, 2014 at 6:19 PM, Tom Lane  wrote:
>>> "ampeeknexttuple"?  That's a bit scary.  It would certainly be unsafe
>>> for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
>>> nbtree/README).

>> It's not really the tuple, just the tid

> And, furthermore, it's used only to do prefetching, so even if the tid
> was invalid when the tuple needs to be accessed, it wouldn't matter,
> because the indexam wouldn't use the result of ampeeknexttuple to do
> anything at that time.

Nonetheless, getting the next tid out of the index may involve stepping
to the next index page, at which point you've lost your interlock
guaranteeing that the *previous* tid will still mean something by the time
you arrive at its heap page.  I presume that the ampeeknexttuple call is
issued before trying to visit the heap (otherwise you're not actually
getting much I/O overlap), so I think there's a real risk here.

Having said that, it's probably OK as long as this mode is only invoked
for user queries (with MVCC snapshots) and not for system indexscans.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby


> From: t...@sss.pgh.pa.us
> To: klaussfre...@gmail.com
> CC: hlinnakan...@vmware.com; johnlu...@hotmail.com; 
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
> and patch
> Date: Thu, 29 May 2014 17:56:57 -0400
> 
> Claudio Freire  writes:
> > On Thu, May 29, 2014 at 6:43 PM, Claudio Freire  
> > wrote:
> >> On Thu, May 29, 2014 at 6:19 PM, Tom Lane  wrote:
> >>> "ampeeknexttuple"?  That's a bit scary.  It would certainly be unsafe
> >>> for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
> >>> nbtree/README).
> 
> >> It's not really the tuple, just the tid
> 
> > And, furthermore, it's used only to do prefetching, so even if the tid
> > was invalid when the tuple needs to be accessed, it wouldn't matter,
> > because the indexam wouldn't use the result of ampeeknexttuple to do
> > anything at that time.
> 
> Nonetheless, getting the next tid out of the index may involve stepping
> to the next index page, at which point you've lost your interlock

I think we are ok as peeknexttuple (yes  bad name,  sorry,  can change it ...

never advances to another page  :



 *btpeeknexttuple() -- peek at the next tuple different from any blocknum 
in pfch_list

 *   without reading a new index page

 *   and without causing any side-effects such as altering 
values in control blocks

 *   if found, store blocknum in next element of pfch_list




> guaranteeing that the *previous* tid will still mean something by the time
> you arrive at its heap page.  I presume that the ampeeknexttuple call is
> issued before trying to visit the heap (otherwise you're not actually
> getting much I/O overlap), so I think there's a real risk here.
> 
> Having said that, it's probably OK as long as this mode is only invoked
> for user queries (with MVCC snapshots) and not for system indexscans.
> 
>   regards, tom lane
  

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:56 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> On Thu, May 29, 2014 at 6:43 PM, Claudio Freire  
>> wrote:
>>> On Thu, May 29, 2014 at 6:19 PM, Tom Lane  wrote:
 "ampeeknexttuple"?  That's a bit scary.  It would certainly be unsafe
 for non-MVCC snapshots (read about vacuum vs indexscan interlocks in
 nbtree/README).
>
>>> It's not really the tuple, just the tid
>
>> And, furthermore, it's used only to do prefetching, so even if the tid
>> was invalid when the tuple needs to be accessed, it wouldn't matter,
>> because the indexam wouldn't use the result of ampeeknexttuple to do
>> anything at that time.
>
> Nonetheless, getting the next tid out of the index may involve stepping
> to the next index page, at which point you've lost your interlock
> guaranteeing that the *previous* tid will still mean something by the time

No, no... that's exactly why a new regproc is needed, because for
prefetching, we need to get the next tid that satisfies some
conditions *without* walking the index.

This, in nbtree, only looks through the tid array to find the suitable
tid, or just return false if the array is exhausted.

> Having said that, it's probably OK as long as this mode is only invoked
> for user queries (with MVCC snapshots) and not for system indexscans.

I think system index scans will also invoke this. There's no rule
excluding that possibility.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 7:11 PM, John Lumby  wrote:
>> Nonetheless, getting the next tid out of the index may involve stepping
>> to the next index page, at which point you've lost your interlock
>
> I think we are ok as peeknexttuple (yes  bad name,  sorry,  can change it
> ...
> never advances to another page  :
>
>  *btpeeknexttuple() -- peek at the next tuple different from any
> blocknum in pfch_list
>  *   without reading a new index page
>  *   and without causing any side-effects such as
> altering values in control blocks
>  *   if found, store blocknum in next element of pfch_list


Yeah, I was getting to that conclusion myself too.

We could call it amprefetchnextheap, since it does just prefetch, and
is good for nothing *but* prefetch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Andres Freund
Hi,

On 2014-05-29 17:53:51 -0400, John Lumby wrote:
> to see how we have done it.   And I am attaching corrected version
> of your test program which runs just fine.

It's perfectly fine to not be up to the coding style at this point, but
trying to adhere to it to some degree will make code review later less
painfull...
* comments with **
* line length
* tabs vs spaces
* ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread John Lumby


> Date: Thu, 29 May 2014 18:00:28 -0300
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal 
> and patch
> From: klaussfre...@gmail.com
> To: hlinnakan...@vmware.com
> CC: johnlu...@hotmail.com; pgsql-hackers@postgresql.org
> 

> >>>
> >>> Even if it worked on Linux today, it would be a bad idea to rely on it
> >>> from
> >>> a portability point of view. No, the only sane way to make this work is
> >>> that
> >>> the process that initiates an I/O request is responsible for completing
> >>> it.


I meant to add  -  it is really a significant benefit that a bkend
can wait on the aio of a different bkend's original prefeetching aio_read.
Remember that we check completion only when the bkend decides it really
wants the block in a buffer,i.e ReadBuffer and friends,
which might be a very long time after it had issued the prefetch request,
or even never (see below).We don't want other bkends which want that
block to have to wait for the originator to get around to reading it.
*Especially* since the originator may *never* read it if it quits its scan early
leaving prefetched but unread blocks behind.   (Which is also taken
care of in the patch).


> >>> If another process needs to wait for an async I/O to complete, we must
> >>> use
> >>> some other means to do the waiting. Like the io_in_progress_lock that we
> >>> already have, for the same purpose.


  

Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 6:53 PM, John Lumby  wrote:
> Well,  as mentioned earlier,  it is not broken. Whether it is efficient
> I am not sure.
> I have looked at the mutex in aio_suspend that you mentioned and I am not
> quite convinced that,  if caller is not the original aio_read process,
> it renders the suspend() into an instant timeout.  I will see if I can
> verify that.
> Where are you (Claudio) seeing 10us?


fd.c, in FileCompleteaio, sets timeout to:

my_timeout.tv_sec = 0; my_timeout.tv_nsec = 1;

Which is 10k ns, which is 10 us.

It loops 256 times at most, so it's polling 256 times with a 10 us
timeout. Sounds wasteful.

I'd:

1) If it's the same process, wait for the full timeout (no looping).
If you have to loop (EAGAIN or EINTR - which I just noticed you don't
check for), that's ok.

2) If it's not, just fall through, don't wait, issue the I/O. The
kernel will merge the requests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-05-29 Thread Claudio Freire
On Thu, May 29, 2014 at 7:26 PM, Claudio Freire  wrote:
> 1) If it's the same process, wait for the full timeout (no looping).
> If you have to loop (EAGAIN or EINTR - which I just noticed you don't
> check for), that's ok.

Sorry, meant to say just looping on EINTR.

About the style guidelines, no, I just copy the style of surrounding
code usually.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd uuid-ossp behavior on smew and shearwater

2014-05-29 Thread Andrew Dunstan


On 05/29/2014 05:41 PM, Josh Kupershmidt wrote:

On Thu, May 29, 2014 at 4:06 PM, Andrew Dunstan  wrote:

Almost all my critters run in VMs (all but jacana and bowerbird).

They're not running on OpenVZ, are they? `ifconfig` on shearwater says:


[...]

and it seems this all-zeros MAC address is a common
(mis-?)configuration on OpenVZ:

https://github.com/robertdavidgraham/masscan/issues/43
http://stackoverflow.com/questions/5838225/how-do-i-get-a-default-gridgain-node-in-openvz-discover-other-nodes-on-the-same
http://forum.openvz.org/index.php?t=msg&goto=8117





Yeah, that's a bit ugly. Mine are on qemu, one (sitella) is on Xen.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Logical slots not mentioned in CREATE_REPLICATION_SLOT for replication protocol docs

2014-05-29 Thread Michael Paquier
On Fri, May 30, 2014 at 5:31 AM, Robert Haas  wrote:
> Thanks, this looks good.  But shouldn't the bit about output plugin
> options mention say something like:
>
> ( option_name option_argument [, ...] )
>
> ...instead of just:
>
> ( option [, ...] )
> ?
Yes, true. Here is an updated patch.
-- 
Michael
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3a2421b..43861d0 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1428,10 +1428,10 @@ The commands accepted in walsender mode are:
   
 
   
-CREATE_REPLICATION_SLOT slotname PHYSICALCREATE_REPLICATION_SLOT
+CREATE_REPLICATION_SLOT slotname { PHYSICAL | LOGICAL output_plugin } CREATE_REPLICATION_SLOT
 
  
-  Create a physical replication
+  Create a physical or logical replication
   slot. See  for more about
   replication slots.
  
@@ -1445,6 +1445,16 @@ The commands accepted in walsender mode are:
  

   
+
+  
+   output_plugin
+   
+ 
+  The name of the output plugin used for logical decoding
+  (see ).
+ 
+   
+  
  
 
   
@@ -1778,7 +1788,7 @@ The commands accepted in walsender mode are:
 
   
   
-START_REPLICATION SLOT slotname LOGICAL XXX/XXX
+START_REPLICATION SLOT slotname LOGICAL XXX/XXX [ ( option_name value [, ... ] ) ]
 
  
   Instructs server to start streaming WAL for logical replication, starting
@@ -1811,6 +1821,22 @@ The commands accepted in walsender mode are:
 

   
+  
+   option_name
+   
+
+ Custom option name for logical decoding plugin.
+
+   
+  
+  
+   value
+   
+
+ Value associated with given option for logical decoding plugin.
+
+   
+  
  
 
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] json/jsonb and unicode escapes

2014-05-29 Thread Andrew Dunstan


Here is a draft patch for some of the issues to do with unicode escapes 
that Teodor raised the other day.


I think it does the right thing, although I want to add a few more 
regression cases before committing it.


Comments welcome.

cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7364f3..47ab9be 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -2274,7 +2274,27 @@ escape_json(StringInfo buf, const char *str)
 appendStringInfoString(buf, "\\\"");
 break;
 			case '\\':
-appendStringInfoString(buf, "");
+/*
+ * Unicode escapes are passed through as is. There is no
+ * requirement that they denote a valid character in the
+ * server encoding - indeed that is a big part of their 
+ * usefulness.
+ *
+ * All we require is that they consist of \u where 
+ * the Xs are hexadecimal digits. It is the responsibility
+ * of the caller of, say, to_json() to make sure that the
+ * unicode escape is valid. 
+ *
+ * In the case of a jsonb string value beng escaped, the
+ * only unicode escape that should be present is \u,
+ * all the other unicode escapes will have been resolved.
+ *
+ */
+if (p[1] == 'u' && isxdigit(p[2]) && isxdigit(p[3])
+	&& isxdigit(p[4]) && isxdigit(p[5]))
+	appendStringInfoCharMacro(buf, *p);
+else
+	appendStringInfoString(buf, "");
 break;
 			default:
 if ((unsigned char) *p < ' ')
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index ae7c506..1e46939 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb;
 DETAIL:  "\u" must be followed by four hexadecimal digits.
 CONTEXT:  JSON data, line 1: "\u000g...
 SELECT '"\u"'::jsonb;		-- OK, legal escape
-   jsonb   

- "\\u"
+  jsonb   
+--
+ "\u"
 (1 row)
 
 -- use octet_length here so we don't get an odd unicode char in the
diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out
index 38a95b4..955dc42 100644
--- a/src/test/regress/expected/jsonb_1.out
+++ b/src/test/regress/expected/jsonb_1.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb;
 DETAIL:  "\u" must be followed by four hexadecimal digits.
 CONTEXT:  JSON data, line 1: "\u000g...
 SELECT '"\u"'::jsonb;		-- OK, legal escape
-   jsonb   

- "\\u"
+  jsonb   
+--
+ "\u"
 (1 row)
 
 -- use octet_length here so we don't get an odd unicode char in the

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] recovery testing for beta

2014-05-29 Thread Amit Kapila
On Thu, May 29, 2014 at 10:09 PM, Jeff Janes  wrote:
>
> What features in 9.4 need more beta testing for recovery?

Another feature which have interaction with recovery is reduced WAL
for Update operation:
http://www.postgresql.org/message-id/e1wnqjm-0003cz...@gemulon.postgresql.org

Commit: a3115f

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] fix worker_spi to run as non-dynamic background worker

2014-05-29 Thread Robert Haas
On Tue, May 27, 2014 at 12:11 AM, Michael Paquier
 wrote:
> The feature itself is already mentioned in the release notes in a very
> general manner, by telling that bgworkers can be dynamically
> "registered" and "started". Users can still refer to the documentation
> to see more precisely what kind of infrastructure is in place in 9.4.
>
> So... As I'm on it... What about something like the attached patch?

Hearing no comment from Bruce, committed.  Bruce, feel free to modify
or revert if you don't think this is appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spreading full-page writes

2014-05-29 Thread Robert Haas
On Tue, May 27, 2014 at 8:15 AM, Heikki Linnakangas
 wrote:
> Since you will be flushing the buffers one "redo partition" at a time, you
> would want to allow the OS to do merge the writes within a partition as much
> as possible. So my even-odd split would in fact be pretty bad. Some sort of
> striping, e.g. mapping each contiguous 1 MB chunk to the same partition,
> would be better.

I suspect you'd actually want to stripe by segment (1GB partition).
If you striped by 1MB partitions, there might still be writes to the
parts of the file you weren't checkpointing that would be flushed by
the fsync().  That would lead to more physical I/O overall, if those
pages were written again before you did the next half-checkpoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw and connection management

2014-05-29 Thread Robert Haas
On Mon, May 26, 2014 at 10:47 PM, Shigeru Hanada
 wrote:
> 2014-05-24 0:09 GMT+09:00 Sandro Santilli :
>> Indeed I tried "DISCARD ALL" in hope it would have helped, so I find
>> good your idea of allowing extensions to register an hook there.
>>
>> Still, I'd like the FDW handler itself to possibly be configured
>> to disable the pool completely as a server-specific configuration.
>
> Connection management seems FDW-specific feature to me.  How about to
> add FDW option, say pool_connection=true|false, to postgres_fdw which
> allows per-server configuration?

Right... or you could have an option to close the connection at
end-of-statement, end-of-transaction, or end-of-session.  But quite
apart from that, it seems like there ought to be a way to tell an FDW
to flush its state.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sleep() doesn't work well with recovery conflict interrupts.

2014-05-29 Thread Amit Kapila
On Wed, May 28, 2014 at 8:53 PM, Andres Freund 
wrote:
> Hi,
>
> Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
> WaitLatch() to wait. That's fine in itself. But
> procsignal_sigusr1_handler, which is used e.g. when resolving recovery
> conflicts, doesn't unconditionally do a SetLatch().
> That means that we'll we'll currently not be able to cancel conflicting
> backends during recovery for 10min. Now, I don't think that'll happen
> too often in practice, but it's still annoying.

How will such a situation occur, aren't we using pg_usleep during
RecoveryConflict functions
(ex. in ResolveRecoveryConflictWithVirtualXIDs)?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com