Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2014-01-17 Thread Boszormenyi Zoltan

2014-01-16 22:13 keltezéssel, Alvaro Herrera írta:

Alvaro Herrera escribió:

Boszormenyi Zoltan escribió:


All patches are attached again for completeness.

Thanks.  I pushed a commit comprising patches 09 through 14.

Now also pushed 15 to 17.


Thanks very much.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead, was: Re: ECPG fixes

2014-01-17 Thread Boszormenyi Zoltan

2014-01-16 23:42 keltezéssel, Alvaro Herrera írta:

Boszormenyi Zoltan escribió:

Rebased patches after the regression test and other details were fixed
in the infrastructure part.

This thread started in 2010, and various pieces have been applied
already and some others have changed in nature.  Would you please post a
new patchset, containing rebased patches that still need application, in
a new email thread to be linked in the commitfest entry?


I will do that.

Best regards,
Zoltán Böszörményi



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Funny representation in pg_stat_statements.query.

2014-01-17 Thread Kyotaro HORIGUCHI
Hello, I noticed that pg_stat_statements.query can have funny values.

| =# select pg_stat_statements_reset();
| =# select current_timestamp(0);
| snip
| =# select query from pg_stat_statements;
|query
| 
|  select ?(0);
|  select pg_stat_statements_reset();

The same thing happenes for CURRENT_DATE, CURRENT_TIME, LOCALTIME
and LOCALTIMESTAMP which are specially treated, that is,
immediately replaced with a combination of a source function and
a typecast in gram.y.

The story is as follows,

At first, for instance, CURRENT_TIMESTAMP(0) is replaced with
'now()::timestamptz(0)' and CURRENT_TIMESTAMP's location is
used as that of 'now' and the location of the const '1' of
LOCALTIME is used as that of the const '1' for 'timestamptz'.

Let's assume the orignal query was,

| pos: Query
|  0: SELECT
|  7: CURRENT_TIMESTAMP(
| 25: 0
|   : );

This is parsed in gram.y as follows, the location for
'CURRENT_TIMESTAMP' above (7) is used as the location for the
CONST now.

| TypeCast {
|   TypeCast {
| A_Const(now)
| TypeName {names = [text], loc = -1 (undef) }
| loc = 7
|   }
|   Typename {names = [timestamptz], loc = -1 }
|   typemods = [Const {1, loc = 25}]
| }

Then this is transformed into,

| FuncExpr {
|   funcid = 'timestamptz'
|   args = [
| CoerceViaIO {
|   arg = Const {type = text, value = now, loc = 7 }
|   loc = -1
| }
| Const { type = int4, value = 1, loc = -1 }
|   ]
| }

Finally pg_stat_statements picks the location '7' as a location
of some constant and replaces the token there with '?'
nevertheless it is originally the location of 'CURRENT_TIMESTAMP'
which is not a constant for users.

I found two ways to fix this issue. Both of them prevents wrong
masking but the original constant parameter ('0') becomes won't
be masked. This behavior seems sane enough for the porpose.

A. Making pg_stat_statements conscious of the token types to
   prevent wrong masking.

  20140117_remove_needless_location_setting.patch

B. Letting gram.y not set the location for the problematic nodes.

  20140117_skip_nonconstants_on_nomalization.patch
  

I don't have firm idea which is preferable. Or the possible
another solution.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f0b9507..30da7aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11355,7 +11355,7 @@ func_expr_common_subexpr:
 	 * to rely on it.)
 	 */
 	Node *n;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	$$ = makeTypeCast(n, SystemTypeName(date), -1);
 }
 			| CURRENT_TIME
@@ -11365,7 +11365,7 @@ func_expr_common_subexpr:
 	 * See comments for CURRENT_DATE.
 	 */
 	Node *n;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	$$ = makeTypeCast(n, SystemTypeName(timetz), -1);
 }
 			| CURRENT_TIME '(' Iconst ')'
@@ -11376,7 +11376,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	d = SystemTypeName(timetz);
 	d-typmods = list_make1(makeIntConst($3, @3));
 	$$ = makeTypeCast(n, d, -1);
@@ -11397,7 +11397,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	d = SystemTypeName(timestamptz);
 	d-typmods = list_make1(makeIntConst($3, @3));
 	$$ = makeTypeCast(n, d, -1);
@@ -11409,7 +11409,7 @@ func_expr_common_subexpr:
 	 * See comments for CURRENT_DATE.
 	 */
 	Node *n;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	$$ = makeTypeCast((Node *)n, SystemTypeName(time), -1);
 }
 			| LOCALTIME '(' Iconst ')'
@@ -11420,7 +11420,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	d = SystemTypeName(time);
 	d-typmods = list_make1(makeIntConst($3, @3));
 	$$ = makeTypeCast((Node *)n, d, -1);
@@ -11432,7 +11432,7 @@ func_expr_common_subexpr:
 	 * See comments for CURRENT_DATE.
 	 */
 	Node *n;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = makeStringConstCast(now, -1, SystemTypeName(text));
 	$$ = makeTypeCast(n, SystemTypeName(timestamp), -1);
 }
 			| LOCALTIMESTAMP '(' Iconst ')'
@@ -11443,7 +11443,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast(now, @1, SystemTypeName(text));
+	n = 

[HACKERS] improve the help message about psql -F

2014-01-17 Thread Jov
in the offical doc,-F say:

 Use separator as the field separator for unaligned output.


but in the psql --help,-F say:

 set field separator (default: |)


if user don't read the offical doc carefully,he can use:

psql -F , -c 'select ...'

But can't get what he want.
It is a bad user Experience.

I make a patch change the help message for -F to:

 set field separator for unaligned output (default: |)


patch for head attached.

Jov
blog: http:amutu.com/blog http://amutu.com/blog


filed_separator_set_help_imporove.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-17 Thread Pavel Stehule
Hello

After some thinking I don't think so this design is not good. It  changing
a working with exception (error) levels - and it is not consistent with
other PostgreSQL parts.

A benefit is less than not clean configuration. Better to solve similar
issues via specialized plpgsql extensions or try to help me push
plpgsql_check_function to core. It can be a best holder for this and
similar checks.

Regards

Pavel




2014/1/15 Marko Tiikkaja ma...@joh.to

 On 1/15/14 3:09 PM, Pavel Stehule wrote:

 You first should to say, what is warning and why it is only warning and
 not
 error.


 Personally, I'm a huge fan of the computer telling me that I might have
 made a mistake.  But on the other hand, I also really hate it when the
 computer gets in my way when I'm trying to test something quickly and
 making these mistakes on purpose.  Warnings are really good for that: hard
 to ignore (YMMV) accidentally, but easy to spot when developing.

 As to how we would categorize these checks between warnings and errors..
  I can't really answer that.  I'm tempted to say anything that is an error
 now is an error, any additional checks we might add are warnings, but
 that's effectively just freezing the definition at an arbitrary point in
 time.


  And why plpgsql warning processing should be different than general
 postgresql processing?


 What do you mean?  We're adding extra checks on *top* of the normal this
 is clearly an error conditions.  PostgreSQL in general doesn't really do
 that.  Consider:

   SELECT * FROM foo WHERE fooid IN (SELECT fooid FROM bar);

 where the table bar doesn't have a column fooid.  That's a perfectly
 valid query, but it almost certainly doesn't do what you would want.
 Personally I'd like to see a WARNING here normally, but I've eventually
 learned to live with this caveat.  I'm hoping that in PL/PgSQL we could at
 least solve some of the most annoying pitfalls.


  My objection is against too general option. Every option shoudl to do one
 clean thing.


 It looks to me like the GUC *is* doing only one thing.  list of warnings
 I want to see, or the shorthand all for convenience.


 Regards,
 Marko Tiikkaja



Re: [HACKERS] plpgsql.warn_shadow

2014-01-17 Thread Marko Tiikkaja

On 1/17/14 11:26 AM, Pavel Stehule wrote:

After some thinking I don't think so this design is not good. It  changing
a working with exception (error) levels - and it is not consistent with
other PostgreSQL parts.

A benefit is less than not clean configuration. Better to solve similar
issues via specialized plpgsql extensions or try to help me push
plpgsql_check_function to core. It can be a best holder for this and
similar checks.


I see these as being two are different things.  There *is* a need for a 
full-blown static analyzer for PL/PgSQL, but I don't think it needs to 
be in core.  However, there seems to be a number of pitfalls we could 
warn the user about with little effort in core, and I think those are 
worth doing.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Alexander Korotkov
On Wed, Jan 15, 2014 at 10:47 AM, Alexander Korotkov
aekorot...@gmail.comwrote:

 On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 14.1.2014 00:38, Tomas Vondra wrote:
  On 13.1.2014 18:07, Alexander Korotkov wrote:
  On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  On 8.1.2014 22:58, Alexander Korotkov wrote:
   Thanks for reporting. Fixed version is attached.
 
  I've tried to rerun the 'archie' benchmark with the current patch,
 and
  once again I got
 
 PANIC:  could not split GIN page, didn't fit
 
  I reran it with '--enable-cassert' and with that I got
 
  TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
 items[i])  0), File: gindatapage.c, Line:
 149)
  LOG:  server process (PID 5364) was terminated by signal 6: Aborted
  DETAIL:  Failed process was running: INSERT INTO messages ...
 
  so the assert in GinDataLeafPageGetUncompressed fails for some
 reason.
 
  I can easily reproduce it, but my knowledge in this area is rather
  limited so I'm not entirely sure what to look for.
 
 
  I've fixed this bug and many other bug. Now patch passes test suite
 that
  I've used earlier. The results are so:
 
  OK, it seems the bug is gone. However now there's a memory leak
  somewhere. I'm loading pgsql mailing list archives (~600k messages)
  using this script
 
 https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
 
  And after loading about 1/5 of the data, all the memory gets filled by
  the pgsql backends (loading the data in parallel) and the DB gets killed
  by the OOM killer.

 I've spent a fair amount of time trying to locate the memory leak, but
 so far no luck. I'm not sufficiently familiar with the GIN code.

 I can however demonstrate that it's there, and I have rather simple test
 case to reproduce it - basically just a CREATE INDEX on a table with ~1M
 email message bodies (in a tsvector column). The data is available here
 (360MB compressed, 1GB raw):

http://www.fuzzy.cz/tmp/message-b.data.gz

 Simply create a single-column table, load data and create the index

CREATE TABLE test ( body_tsvector TSVECTOR );
COPY test FROM '/tmp/message-b.data';
CREATE test_idx ON test USING gin test ( body_tsvector );

 I'm running this on a machine with 8GB of RAM, with these settings

shared_buffers=1GB
maintenance_work_mem=1GB

 According to top, CREATE INDEX from the current HEAD never consumes more
 than ~25% of RAM:

 PID USER  PR  NIVIRTRESSHR  %CPU %MEM  COMMAND
   32091 tomas 20   0 2026032 1,817g 1,040g  56,2 23,8  postgres

 which is about right, as (shared_buffers + maintenance_work_mem) is
 about 1/4 of RAM.

 With the v5 patch version applied, the CREATE INDEX process eventually
 goes crazy and allocates almost all the available memory (but somesimes
 finishes, mostly by pure luck). This is what I was able to get from top

 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM  COMMAND
   14090 tomas 20   0 7913820 6,962g 955036 D   4,3 91,1  postgres

 while the system was still reasonably responsive.


 Thanks a lot for your help! I believe problem is that each decompressed
 item pointers array is palloc'd but not freed. I hope to fix it today.


Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-varbyte6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: implement gen_random_uuid

2014-01-17 Thread Emre Hasegeli
2014/1/9 Oskari Saarenmaa o...@ohmu.fi:
 The only useful feature of the uuid-ossp module in my opinion is the
 uuid_generate_v4 function and as uuid-ossp is more or less abandonware
 people have had trouble building and installing it.  This patch implements
 an alternative uuid v4 generation function in pgcrypto which could be moved
 to core once there's a core PRNG with large enough internal state.

It is a small but very useful patch. Installing uuid-ossp can be very hard
on some systems. There is not much to review. The patch applies cleanly to
HEAD. The function is generating valid UUID version 4. The code and
the documentation style seems to fit in the pgcrypto extension. I am marking
it as Ready for Commiter.

The problem is users probably would not look pgcrypto extension for
UUID generator, especially when there is another extension with uuid in
it's name. Also, UUID generator does not sound like a cryptographic function.
It would be much better, if this would be in core with the UUID type. There
is a reference on the UUID Type documentation page to the uuid-ossp
extension. We can add a reference to pgcrypro extension in that page and
consider adding a deprecation note to the uuid-ossp extension, if is is not
possible to add the function to the core, for now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix double-inclusion of pg_config_os.h when building extensions with Visual Studio

2014-01-17 Thread Magnus Hagander
On Sat, Jan 11, 2014 at 8:57 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 Hi all

 Related to the earlier comments about building extensions on Windows, I
 just noticed that we don't treat WINDLL as equivalent to WIN32, and
 WIN32 isn't set in a Visual Studio DLL project.

 We should actually be testing _WIN32, which is the compiler's
 pre-defined macro. The attached patch to c.h takes care of that - it
 tests _WIN32 in c.h and sets WIN32 early if it's found.

 _WIN32 is set for both win32 and win64, like we expect from WIN32.


Regardless of where the other thread goes, this seems like something we
should fix. Thus - applied, with minor changes to the comment, thanks.

My understanding is that this change alone doesn't actually help us very
much, so I haven't backpatched it anywhere. Let me know if that
understanding was incorrect, and it would actually help as a backpatch.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Feature request: Logging SSL connections

2014-01-17 Thread Magnus Hagander
On Sun, Dec 8, 2013 at 10:27 AM, Marko Kreen mark...@gmail.com wrote:

 On Fri, Dec 06, 2013 at 02:53:27PM +0100, Dr. Andreas Kunert wrote:
  Anything else missing?
  
  Functionally it's fine now, but I see few style problems:
  
  - if (port-ssl  0) is wrong, -ssl is pointer.  So use just
 if (port-ssl).
  - Deeper indentation would look nicer with braces.
  - There are some duplicated message, could you restructure it so that
 each message exists only once.
 
  New version is attached. I could add braces before and after the
  ereport()-calls too, but then I need two more #ifdefs to catch the
  closing braces.

 Thank you.  Looks good now.  I added it to next commitfest:

   https://commitfest.postgresql.org/action/patch_view?id=1324



Applied, thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] patch: option --if-exists for pg_dump

2014-01-17 Thread Pavel Stehule
Hello


2014/1/16 Jeevan Chalke jeevan.cha...@enterprisedb.com

 Hi Pavel,

 I have reviewed the patch and here are my concerns and notes:

 POSITIVES:
 ---
 1. Patch applies with some white-space errors.
 2. make / make install / make check is smooth. No issues as such.
 3. Feature looks good as well.
 4. NO concern on overall design.
 5. Good work.


 NEGATIVES:
 ---

 Here are the points which I see in the review and would like you to have
 your attention.

 1.
 +It use conditional commands (with literalIF EXISTS/literal

 Grammar mistakes. use = uses

 2.
 @@ -55,7 +55,8 @@ static ArchiveHandle *_allocAH(const char *FileSpec,
 const ArchiveFormat fmt,
  const int compression, ArchiveMode mode, SetupWorkerPtr
 setupWorkerPtr);
  static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
   ArchiveHandle *AH);
 -static void _printTocEntry(ArchiveHandle *AH, TocEntry *te,
 RestoreOptions *ropt, bool isData, bool acl_pass);
 +static void _printTocEntry(ArchiveHandle *AH, TocEntry *te,
 RestoreOptions *ropt,
 +bool isData, bool acl_pass);
  static char *replace_line_endings(const char *str);
  static void _doSetFixedOutputState(ArchiveHandle *AH);
  static void _doSetSessionAuth(ArchiveHandle *AH, const char *user);
 @@ -234,6 +235,7 @@ RestoreArchive(Archive *AHX)
 boolparallel_mode;
 TocEntry   *te;
 OutputContext sav;
 +

 AH-stage = STAGE_INITIALIZING;

 @@ -2961,7 +3005,8 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te,
 ArchiveHandle *AH)
  }

  static void
 -_printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt,
 bool isData, bool acl_pass)
 +_printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt,
 bool isData,
 + bool acl_pass)
  {
 /* ACLs are dumped only during acl pass */
 if (acl_pass)

 Above changes are NOT at all related to the patch. Please remove them even
 though they are clean-up like changes. Don't mix them with actual changes.

 3.
 +   if (strncmp(te-dropStmt + desc_len + 5,  IF
 EXISTS, 9) != 0)

  IF EXISTS has 10 characters NOT 9.

 4.
 +   if (strncmp(te-dropStmt + desc_len + 5,  IF
 EXISTS, 9) != 0)
 +   ahprintf(AH, DROP %s IF EXISTS %s,
 + te-desc,
 + te-dropStmt + 6 + desc_len);

 Here you have used strncmp, starting at te-dropStmt + X,
 where X = desc_len + 5. While adding back you used X = 6 + desc_len.
 First time you used 5 as you added space in comparison but for adding back
 we
 want past space location and thus you have used 6. That's correct, but
 little
 bit confusing. Why not you simply used
 +   if (strstr(te-dropStmt, IF EXISTS) != NULL)
 to check whether drop statement has IF EXISTS or not like you did at some
 other place. This will remove my concern 3 and above confusion as well.
 What you think ?

 5.
 +   }
 +
 +   else

 Extra line before else part. Please remove it for consistency.

 6.
 +   printf(_(  --if-exists  use IF EXISTS when dropping
 objects\n));  (pg_dump)
 +   printf(_(  --if-exists  don't report error if cleaned
 object doesn't exist\n));  (pg_dumpall)
 +   printf(_(  --if-exists  use IF EXISTS when dropping
 objects\n));  (pg_restore)

 Please have same message for all three.

 7.
 printf(_(  --binary-upgrade for use by upgrade utilities
 only\n));
 printf(_(  --column-inserts dump data as INSERT commands
 with column names\n));
 +   printf(_(  --if-exists  don't report error if cleaned
 object doesn't exist\n));
 printf(_(  --disable-dollar-quoting disable dollar quoting, use
 SQL standard quoting\n));
 printf(_(  --disable-triggers   disable triggers during
 data-only restore\n));

 Please maintain order like pg_dump and pg_restore. Also at variable
 declaration
 and at options parsing mechanism.


I fixed a previous issue, see a attachment please


 8.
 +   if (if_exists  !outputClean)
 +   exit_horribly(NULL, option --if-exists requires -c/--clean
 option\n);

 Are we really want to exit when -c is not provided ? Can't we simply ignore
 --if-exists in that case (or with emitting a warning) ?


This behave is based on a talk related to proposal of this feature - and I
am thinking,  this behave is little bit safer - ignoring requested
functionality is not what I like. And a error message is simple and clean
in this case - is not difficult to use it and it is not difficult to fix
missing option for user

Regards

Pavel




 Marking Waiting on author.

 Thanks


 --
 Jeevan B Chalke
 Principal Software Engineer, Product Development
 EnterpriseDB Corporation
 The Enterprise PostgreSQL Company


commit 3fecb7805261e96db764fffcae2bb912538903f5
Author: Pavel Stehule pavel.steh...@gmail.com
Date:   Fri 

[HACKERS] wal_buffers = -1

2014-01-17 Thread Magnus Hagander
Is there any real use-case for not setting wal_buffers to -1 these days?

Or should we just remove it and use the -1 behaviour at all times?

IIRC we discussed not keeping it at all when the autotune behavior was
introduced, but said we wanted to keep it just in case. If we're not
ready to remove it, then does that just mean that we need to fix it so we
can?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Thom Brown
On 17 January 2014 13:01, Magnus Hagander mag...@hagander.net wrote:
 Is there any real use-case for not setting wal_buffers to -1 these days?

 Or should we just remove it and use the -1 behaviour at all times?

 IIRC we discussed not keeping it at all when the autotune behavior was
 introduced, but said we wanted to keep it just in case. If we're not ready
 to remove it, then does that just mean that we need to fix it so we can?

Robert Haas reported that setting it to 32MB can yield a considerable
performance benefit:

http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Andres Freund
Hi,

On 2014-01-17 14:01:56 +0100, Magnus Hagander wrote:
 Is there any real use-case for not setting wal_buffers to -1 these days?
 
 Or should we just remove it and use the -1 behaviour at all times?

I have seen improvements by setting it larger than the max -1 one
value. Also, for some workloads (low latency) it can be beneficial to
have a small s_b but still have a larger wal_buffers setting.

So -1 for removing it from me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Magnus Hagander
On Fri, Jan 17, 2014 at 2:07 PM, Thom Brown t...@linux.com wrote:

 On 17 January 2014 13:01, Magnus Hagander mag...@hagander.net wrote:
  Is there any real use-case for not setting wal_buffers to -1 these days?
 
  Or should we just remove it and use the -1 behaviour at all times?
 
  IIRC we discussed not keeping it at all when the autotune behavior was
  introduced, but said we wanted to keep it just in case. If we're not
 ready
  to remove it, then does that just mean that we need to fix it so we can?

 Robert Haas reported that setting it to 32MB can yield a considerable
 performance benefit:


 http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com


In that case, sholdn't the autotuning be changed to not limit it at 16MB?
:)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote:
 On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner da...@fromorbit.com wrote:
 
  On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
   On 1/15/14, 12:00 AM, Claudio Freire wrote:
   My completely unproven theory is that swapping is overwhelmed by
   near-misses. Ie: a process touches a page, and before it's
   actually swapped in, another process touches it too, blocking on
   the other process' read. But the second process doesn't account
   for that page when evaluating predictive models (ie: read-ahead),
   so the next I/O by process 2 is unexpected to the kernel. Then
   the same with 1. Etc... In essence, swap, by a fluke of its
   implementation, fails utterly to predict the I/O pattern, and
   results in far sub-optimal reads.
   
   Explicit I/O is free from that effect, all read calls are
   accountable, and that makes a difference.
   
   Maybe, if the kernel could be fixed in that respect, you could
   consider mmap'd files as a suitable form of temporary storage.
   But that would depend on the success and availability of such a
   fix/patch.
  
   Another option is to consider some of the more radical ideas in
   this thread, but only for temporary data. Our write sequencing and
   other needs are far less stringent for this stuff.  -- Jim C.
 
  I suspect that a lot of the temporary data issues can be solved by
  using tmpfs for temporary files
 
 
 Temp files can collectively reach hundreds of gigs.

So unless you have terabytes of RAM you're going to have to write
them back to disk.

But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache resident that is at
least an order of magnitude larger than the cyclic 5-15 minute WAL
dataset that ongoing operations need to manage to avoid IO storms.
Where do these temporary files fit into this picture, how fast do
they grow and why are do they need to be so large in comparison to
the ongoing modifications being made to the database?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
 On 1/15/14, 12:00 AM, Claudio Freire wrote:
 My completely unproven theory is that swapping is overwhelmed by
 near-misses. Ie: a process touches a page, and before it's
 actually swapped in, another process touches it too, blocking on
 the other process' read. But the second process doesn't account
 for that page when evaluating predictive models (ie: read-ahead),
 so the next I/O by process 2 is unexpected to the kernel. Then
 the same with 1. Etc... In essence, swap, by a fluke of its
 implementation, fails utterly to predict the I/O pattern, and
 results in far sub-optimal reads.
 
 Explicit I/O is free from that effect, all read calls are
 accountable, and that makes a difference.
 
 Maybe, if the kernel could be fixed in that respect, you could
 consider mmap'd files as a suitable form of temporary storage.
 But that would depend on the success and availability of such a
 fix/patch.
 
 Another option is to consider some of the more radical ideas in
 this thread, but only for temporary data. Our write sequencing and
 other needs are far less stringent for this stuff.  -- Jim C.

I suspect that a lot of the temporary data issues can be solved by
using tmpfs for temporary files

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
 On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
  But there's something here that I'm not getting - you're talking
  about a data set that you want ot keep cache resident that is at
  least an order of magnitude larger than the cyclic 5-15 minute WAL
  dataset that ongoing operations need to manage to avoid IO storms.
  Where do these temporary files fit into this picture, how fast do
  they grow and why are do they need to be so large in comparison to
  the ongoing modifications being made to the database?

[ snip ]

 Temp files are something else again.  If PostgreSQL needs to sort a
 small amount of data, like a kilobyte, it'll use quicksort.  But if it
 needs to sort a large amount of data, like a terabyte, it'll use a
 merge sort.[1] 

IOWs the temp files contain data that requires transformation as
part of a query operation. So, temp file size is bound by the
dataset, growth determined by data retreival and transformation
rate.

IOWs, there are two very different IO and caching requirements in
play here and tuning the kernel for one actively degrades the
performance of the other. Right, got it now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Jeff Layton
On Thu, 16 Jan 2014 20:48:24 -0500
Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
  But there's something here that I'm not getting - you're talking
  about a data set that you want ot keep cache resident that is at
  least an order of magnitude larger than the cyclic 5-15 minute WAL
  dataset that ongoing operations need to manage to avoid IO storms.
  Where do these temporary files fit into this picture, how fast do
  they grow and why are do they need to be so large in comparison to
  the ongoing modifications being made to the database?
 
 I'm not sure you've got that quite right.  WAL is fsync'd very
 frequently - on every commit, at the very least, and multiple times
 per second even there are no commits going on just to make sure we get
 it all down to the platter as fast as possible.  The thing that causes
 the I/O storm is the data file writes, which are performed either when
 we need to free up space in PostgreSQL's internal buffer pool (aka
 shared_buffers) or once per checkpoint interval (5-60 minutes) in any
 event.  The point of this system is that if we crash, we're going to
 need to replay all of the WAL to recover the data files to the proper
 state; but we don't want to keep WAL around forever, so we checkpoint
 periodically.  By writing all the data back to the underlying data
 files, checkpoints render older WAL segments irrelevant, at which
 point we can recycle those files before the disk fills up.
 

So this says to me that the WAL is a place where DIO should really be
reconsidered. It's mostly sequential writes that need to hit the disk
ASAP, and you need to know that they have hit the disk before you can
proceed with other operations.

Also, is the WAL actually ever read under normal (non-recovery)
conditions or is it write-only under normal operation? If it's seldom
read, then using DIO for them also avoids some double buffering since
they wouldn't go through pagecache.

Again, I think this discussion would really benefit from an outline of
the different files used by pgsql, and what sort of data access
patterns you expect with them.

 Temp files are something else again.  If PostgreSQL needs to sort a
 small amount of data, like a kilobyte, it'll use quicksort.  But if it
 needs to sort a large amount of data, like a terabyte, it'll use a
 merge sort.[1]  The reason is of course that quicksort requires random
 access to work well; if parts of quicksort's working memory get paged
 out during the sort, your life sucks.  Merge sort (or at least our
 implementation of it) is slower overall, but it only accesses the data
 sequentially.  When we do a merge sort, we use files to simulate the
 tapes that Knuth had in mind when he wrote down the algorithm.  If the
 OS runs short of memory - because the sort is really big or just
 because of other memory pressure - it can page out the parts of the
 file we're not actively using without totally destroying performance.
 It'll be slow, of course, because disks always are, but not like
 quicksort would be if it started swapping.
 
 I haven't actually experienced (or heard mentioned) the problem Jeff
 Janes is mentioning where temp files get written out to disk too
 aggressively; as mentioned before, the problems I've seen are usually
 the other way - stuff not getting written out aggressively enough.
 But it sounds plausible.  The OS only lets you set one policy, and if
 you make that file right for permanent data files that get
 checkpointed it could well be wrong for temp files that get thrown
 out.  Just stuffing the data on RAMFS will work for some
 installations, but might not be good if you actually do want to
 perform sorts whose size exceeds RAM.
 
 BTW, I haven't heard anyone on pgsql-hackers say they'd be interesting
 in attending Collab on behalf of the PostgreSQL community.  Although
 the prospect of a cross-country flight is a somewhat depressing
 thought, it does sound pretty cool, so I'm potentially interested.  I
 have no idea what the procedure is here for moving forward though,
 especially since it sounds like there might be only one seat available
 and I don't know who else may wish to sit in it.
 


-- 
Jeff Layton jlay...@redhat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Thom Brown
On 17 January 2014 13:20, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Jan 17, 2014 at 2:07 PM, Thom Brown t...@linux.com wrote:

 On 17 January 2014 13:01, Magnus Hagander mag...@hagander.net wrote:
  Is there any real use-case for not setting wal_buffers to -1 these days?
 
  Or should we just remove it and use the -1 behaviour at all times?
 
  IIRC we discussed not keeping it at all when the autotune behavior was
  introduced, but said we wanted to keep it just in case. If we're not
  ready
  to remove it, then does that just mean that we need to fix it so we can?

 Robert Haas reported that setting it to 32MB can yield a considerable
 performance benefit:


 http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com


 In that case, sholdn't the autotuning be changed to not limit it at 16MB? :)

Well, that's the question.  Do we have a heuristic sweet-spot that
folk would agree on?

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-17 Thread Marti Raudsepp
On Wed, Jan 15, 2014 at 8:23 AM, Jim Nasby j...@nasby.net wrote:
 Do we actually support = right now? We already support

 v_field := field FROM table ... ;

 and I think it's a bad idea to have different meaning for = and :=.

That was already discussed before. Yes, we support both = and := and
they have exactly the same behavior; I agree, we should keep them
equivalent.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Robert Haas
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton jlay...@redhat.com wrote:
 So this says to me that the WAL is a place where DIO should really be
 reconsidered. It's mostly sequential writes that need to hit the disk
 ASAP, and you need to know that they have hit the disk before you can
 proceed with other operations.

Ironically enough, we actually *have* an option to use O_DIRECT here.
But it doesn't work well.  See below.

 Also, is the WAL actually ever read under normal (non-recovery)
 conditions or is it write-only under normal operation? If it's seldom
 read, then using DIO for them also avoids some double buffering since
 they wouldn't go through pagecache.

This is the first problem: if replication is in use, then the WAL gets
read shortly after it gets written.  Using O_DIRECT bypasses the
kernel cache for the writes, but then the reads stink.  However, if
you configure wal_sync_method=open_sync and disable replication, then
you will in fact get O_DIRECT|O_SYNC behavior.

But that still doesn't work out very well, because now the guy who
does the write() has to wait for it to finish before he can do
anything else.  That's not always what we want, because WAL gets
written out from our internal buffers for multiple different reasons.
If we're forcing the WAL out to disk because of transaction commit or
because we need to write the buffer protected by a certain WAL record
only after the WAL hits the platter, then it's fine.  But sometimes
we're writing WAL just because we've run out of internal buffer space,
and we don't want to block waiting for the write to complete.  Opening
the file with O_SYNC deprives us of the ability to control the timing
of the sync relative to the timing of the write.

 Again, I think this discussion would really benefit from an outline of
 the different files used by pgsql, and what sort of data access
 patterns you expect with them.

I think I more or less did that in my previous email, but here it is
again in briefer form:

- WAL files are written (and sometimes read) sequentially and fsync'd
very frequently and it's always good to write the data out to disk as
soon as possible
- Temp files are written and read sequentially and never fsync'd.
They should only be written to disk when memory pressure demands it
(but are a good candidate when that situation comes up)
- Data files are read and written randomly.  They are fsync'd at
checkpoint time; between checkpoints, it's best not to write them
sooner than necessary, but when the checkpoint arrives, they all need
to get out to the disk without bringing the system to a standstill

We have other kinds of files, but off-hand I'm not thinking of any
that are really very interesting, apart from those.

Maybe it'll be useful to have hints that say always write this file
to disk as quick as you can and always postpone writing this file to
disk for as long as you can for WAL and temp files respectively.  But
the rule for the data files, which are the really important case, is
not so simple.  fsync() is actually a fine API except that it tends to
destroy system throughput.  Maybe what we need is just for fsync() to
be less aggressive, or a less aggressive version of it.  We wouldn't
mind waiting an almost arbitrarily long time for fsync to complete if
other processes could still get their I/O requests serviced in a
reasonable amount of time in the meanwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Robert Haas
On Thu, Jan 16, 2014 at 10:56 AM, Andres Freund and...@2ndquadrant.com wrote:
 That'd be most places doing XLogInsert() if you want to be
 consistent. Each needing to be analyzed whether we're blocking important
 resources, determining where in the callstack we can do the
 CHECK_FOR_WAL_BUDGET().

And doing that probably wouldn't be a problem except for the fact that
this patch is showing up at the absolute very last minute with
multiple unresolved design considerations.  I'm with Tom: this should
be postponed to 9.5.

That having been said, I bet it could be done at the tail of
XLogInsert().  if there are cases where that's not desirable, then add
macros HOLDOFF_WAL_THROTTLING() and RESUME_WAL_THROTTLING() that bump
a counter up and down.  When the counter is 0, XLogInsert() doesn't
sleep; when RESUME_WAL_THROTTLING() drops the counter to 0, it also
considers sleeping.  I suspect only a few places would need to do
this, like where we're holding one of the SLRU locks.  Some testing
would be needed to figure out exactly which places cause problems, but
that's why it's a good idea to start discussion of your proposed
feature before January 14th.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Andres Freund
On 2014-01-17 09:04:54 -0500, Robert Haas wrote:
 That having been said, I bet it could be done at the tail of
 XLogInsert().  if there are cases where that's not desirable, then add
 macros HOLDOFF_WAL_THROTTLING() and RESUME_WAL_THROTTLING() that bump
 a counter up and down.  When the counter is 0, XLogInsert() doesn't
 sleep; when RESUME_WAL_THROTTLING() drops the counter to 0, it also
 considers sleeping.  I suspect only a few places would need to do
 this, like where we're holding one of the SLRU locks.

I don't think there are many locations where this would be ok. Sleeping
while holding exclusive buffer locks? Quite possibly inside a criticial
section?
Surely not.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug f...@phlo.org wrote:

 I had some more fun with this, the result is v2.5 of the patch (attached).
 Changes are explained below.


Looks like you've been busy on this! Thank you for implementing all the
changes you talked about.
I've now started working the 2.5 patch you sent and I've ripped out all the
numeric inverse transition functions on that patch and implemented inverse
transitions for max and min using the new NULL returning method that you've
implemented. I've also made a very fast pass and fixed up a few comments
and some random white space. I've not gotten to the documents yet.

I did add a whole bunch of regression tests for all of the inverse
transition functions that are used for max and min. I'm just calling these
like:
SELECT int4larger_inv(3,2),int4larger_inv(3,3),int4larger_inv(3,4);
Rather than using the aggregate as a window function each time. I thought
it was better to validate each of those functions individually then put in
various tests that target a smaller number of aggregates with various
inputs.

A few things still to do that I'll try and get to soon.
1. More testing
2. Docs updated.
3. Perhaps I'll look at adding bitand and bitor inverse transition functions
4. ?

Thanks again for putting so much effort into this.
If you make any more changes would it be possible for you to base them on
the attached patch instead of the last one you sent?

Perhaps if there's much more hacking to do we could start pushing to a
guthub repo to make it easier to work together on this.

Regards

David Rowley


inverse_transition_functions_v2.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
On Jan17, 2014, at 15:14 , David Rowley dgrowle...@gmail.com wrote:
 If you make any more changes would it be possible for you to base them on the 
 attached patch instead of the last one you sent?

Sure! The only reason I didn't rebase my last patch onto yours was that having 
the numeric stuff in there meant potentially better test coverage. Thanks for 
doing the merge!

 Perhaps if there's much more hacking to do we could start pushing to a guthub 
 repo to make it easier to work together on this.

Yeah, that seems like a good idea. Easier to keep track of then a bunch of 
patches floating around. Could you push your latest version?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-17 09:04:54 -0500, Robert Haas wrote:
 That having been said, I bet it could be done at the tail of
 XLogInsert().

 I don't think there are many locations where this would be ok. Sleeping
 while holding exclusive buffer locks? Quite possibly inside a criticial
 section?

More or less by definition, you're always doing both when you call
XLogInsert.

 Surely not.

I agree.  It's got to be somewhere further up the call stack.

I'm inclined to think that what we ought to do is reconceptualize
vacuum_delay_point() as something a bit more generic, and sprinkle
calls to it in a few more places than now.

It's also interesting to wonder about the relationship to
CHECK_FOR_INTERRUPTS --- although I think that currently, we assume
that that's *cheap* (1 test and branch) as long as nothing is pending.
I don't want to see a bunch of arithmetic added to it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Simon Riggs
On 16 January 2014 17:20, Simon Riggs si...@2ndquadrant.com wrote:

 Thank you everyone for responding so positively to this proposal. It
 is something many users have complained about.

 In time, I think we might want both WAL Rate Limiting and I/O Rate
 Limiting. IMHO I/O rate limiting is harder and so this proposal is
 restricted solely to WAL rate limiting.

 I'm comfortable with a session level parameter. I was mulling over a
 wal_rate_limit_scope parameter but I think that is too much.
 I will follow Craig's proposal to define this in terms of MB/s, adding
 that I would calc from beginning of statement to now, so it is
 averaged rate. Setting of zero means unlimited. The rate would be
 per-session (in this release) rather than a globally calculated
 setting.

Parameter renamed to wal_rate_limit.

Not yet modified to produce this in MB/s

 I would like to call it CHECK_FOR_WAL_RATE_LIMIT()

Used CHECK_WAL_BUDGET() as proposed

 That seems like a cheap enough call to allow it to be added in more
 places, so my expanded list of commands touched are
  CLUSTER/VACUUM FULL
  ALTER TABLE (only in phase 3 table rewrite)
  ALTER TABLE SET TABLESPACE
  CREATE INDEX
 which are already there, plus new ones suggested/implied
  COPY
  CREATE TABLE AS SELECT
  INSERT/UPDATE/DELETE

Added, no problems or technical difficulties encountered.

 Please let me know if I missed something (rather than debating what is
 or is not a maintenance command).

 Any other thoughts before I cut v2 ?

v2 attached

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


wal_rate_limiting.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Heikki Linnakangas

On 01/17/2014 05:20 PM, Simon Riggs wrote:

+   if (RelationNeedsWAL(indexrel))
+   CHECK_FOR_WAL_BUDGET();


I don't think we need the RelationNeedsWAL tests. If the relation is not 
WAL-logged, you won't write much WAL, so you should easily stay under 
the limit. And CHECK_FOR_WAL_BUDGET() better be cheap when you're below 
the limit.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Heikki Linnakangas
It occurs to me that this is very similar to the method I proposed in 
June to enforce a hard limit on WAL usage, to avoid PANIC caused by 
running out of disk space when writing WAL:


http://www.postgresql.org/message-id/51b095fe.6050...@vmware.com

Enforcing a global limit needs more book-keeping than rate limiting 
individual sesssions. But I'm hoping that the CHECK_WAL_BUDGET() calls 
could be used for both purposes.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Mel Gorman
On Thu, Jan 16, 2014 at 04:30:59PM -0800, Jeff Janes wrote:
 On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman mgor...@suse.de wrote:
 
  On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
   
That could be something we look at. There are cases buried deep in the
VM where pages get shuffled to the end of the LRU and get tagged for
reclaim as soon as possible. Maybe you need access to something like
that via posix_fadvise to say reclaim this page if you need memory but
leave it resident if there is no memory pressure or something similar.
Not exactly sure what that interface would look like or offhand how it
could be reliably implemented.
   
  
   I think the reclaim this page if you need memory but leave it resident
  if
   there is no memory pressure hint would be more useful for temporary
   working files than for what was being discussed above (shared buffers).
When I do work that needs large temporary files, I often see physical
   write IO spike but physical read IO does not.  I interpret that to mean
   that the temporary data is being written to disk to satisfy either
   dirty_expire_centisecs or dirty_*bytes, but the data remains in the FS
   cache and so disk reads are not needed to satisfy it.  So a hint that
  says
   this file will never be fsynced so please ignore dirty_*bytes and
   dirty_expire_centisecs.
 
  It would be good to know if dirty_expire_centisecs or dirty ratio|bytes
  were the problem here.
 
 
 Is there an easy way to tell?  I would guess it has to be at least
 dirty_expire_centisecs, if not both, as a very large sort operation takes a
 lot more than 30 seconds to complete.
 

There is not an easy way to tell. To be 100%, it would require an
instrumentation patch or a systemtap script to detect when a particular page
is being written back and track the context. There are approximations though.
Monitor nr_dirty pages over time. If at the time of the stall there are fewer
dirty pages than allowed by dirty_ratio then the dirty_expire_centisecs
kicked in. That or monitor the process for stalls, when it stalls check
/proc/PID/stack and see if it's stuck in balance_dirty_pages or something
similar which would indicate the process hit dirty_ratio.

  An interface that forces a dirty page to stay dirty
  regardless of the global system would be a major hazard. It potentially
  allows the creator of the temporary file to stall all other processes
  dirtying pages for an unbounded period of time.
 
 Are the dirty ratio/bytes limits the mechanisms by which adequate clean
 memory is maintained? 

Yes, for file-backed pages.

 I thought those were there just to but a limit on
 long it would take to execute a sync call should one be issued, and there
 were other setting which said how much clean memory to maintain.  It should
 definitely write out the pages if it needs the memory for other things,
 just not write them out due to fear of how long it would take to sync it if
 a sync was called.  (And if it needs the memory, it should be able to write
 it out quickly as the writes would be mostly sequential, not
 random--although how the kernel can believe me that that will always be the
 case could a problem)
 

It has been suggested on more than one occasion that a more sensible
interface would be to do not allow more dirty data than it takes N seconds
to writeback. The details of how to implement this are tricky and no one
has taken up the challenge yet.

  I proposed in another part
  of the thread a hint for open inodes to have the background writer thread
  ignore dirty pages belonging to that inode. Dirty limits and fsync would
  still be obeyed. It might also be workable for temporary files but the
  proposal could be full of holes.
 
 
 If calling fsync would fail with an error, would that lower the risk of DoS?
 

I do not understand the proposal. If there are pages that must remain
dirty and the kernel cannot touch then there will be the risk that
dirty_ratio number of pages are all untouchable and the system livelocks
until userspace takes an action.

That still leaves the possibility of flagging temp pages that should
only be written to disk if the kernel really needs to.

-- 
Mel Gorman
SUSE Labs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Funny representation in pg_stat_statements.query.

2014-01-17 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Hello, I noticed that pg_stat_statements.query can have funny values.

I don't think that's an acceptable reason for lobotomizing the parser's
ability to print error cursors, which is what your first patch does
(and without even any documentation that would keep someone from
changing it back).

The second patch might be all right, though I'm a bit disturbed by its
dependency on gram.h constants.  The numeric values of those change on a
regular basis, and who's to say that these are exactly the set of tokens
that we care about?

In the end, really the cleanest fix for this would be to get rid of the
grammar's translation of these special functions into hacky expressions.
They ought to get translated into some new node type(s) that could be
reverse-listed in standard form by ruleutils.c.  I've wanted to do that
for years, but never got around to it ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Logging SSL connections

2014-01-17 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 Applied, thanks!

Minor bikeshedding: the messages would read better, to my eye, as

user=%s database=%s SSL enabled (protocol=%s, cipher=%s)

Putting enabled where it is requires extra mental gymnastics on
the part of the reader.  And why the random change between =
in one part of the string and :  in the new part?  You could take
that last point a bit further and make it

user=%s database=%s SSL=enabled (protocol=%s, cipher=%s)

but I'm not sure if that's an improvement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Hannu Krosing
On 01/17/2014 06:40 AM, Dave Chinner wrote:
 On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
 On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner da...@fromorbit.com wrote:
 But there's something here that I'm not getting - you're talking
 about a data set that you want ot keep cache resident that is at
 least an order of magnitude larger than the cyclic 5-15 minute WAL
 dataset that ongoing operations need to manage to avoid IO storms.
 Where do these temporary files fit into this picture, how fast do
 they grow and why are do they need to be so large in comparison to
 the ongoing modifications being made to the database?
 [ snip ]

 Temp files are something else again.  If PostgreSQL needs to sort a
 small amount of data, like a kilobyte, it'll use quicksort.  But if it
 needs to sort a large amount of data, like a terabyte, it'll use a
 merge sort.[1] 
 IOWs the temp files contain data that requires transformation as
 part of a query operation. So, temp file size is bound by the
 dataset, 
Basically yes, though the size of the dataset can be orders of
magnitude bigger than the database in case of some queries.
 growth determined by data retreival and transformation
 rate.

 IOWs, there are two very different IO and caching requirements in
 play here and tuning the kernel for one actively degrades the
 performance of the other. Right, got it now.
Yes. A step in right solutions would be some way to tune this
on per-device basis, but as large part of this in linux seems
to be driven from the keeping-vm-clean side it guess it will
be far from simple.

 Cheers,

 Dave.


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] currawong is not a happy animal

2014-01-17 Thread David Rowley
I've not been able to build master for a few days now on visual studios due
some problems in the new test_shm_mq contrib module.
I'm not too familiar with how the visual studio project files are generated
so I've not yet managed to come up with a fix just yet.

However I have attached a fix for the many compiler warnings that currawong
has been seeing since 9c14dd2


define_WIN32.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Mel Gorman
On Wed, Jan 15, 2014 at 02:14:08PM +, Mel Gorman wrote:
  One assumption would be that Postgres is perfectly happy with the current
  kernel behaviour in which case our discussion here is done.
 
 It has been demonstrated that this statement was farcical.  The thread is
 massive just from interaction with the LSF/MM program committee. I'm hoping
 that there will be Postgres representation at LSF/MM this year to bring
 the issues to a wider audience. I expect that LSF/MM can only commit to
 one person attending the whole summit due to limited seats but we could
 be more more flexible for the Postgres track itself so informal meetings
 can be arranged for the evenings and at collab summit.
 

We still have not decided on a person that can definitely attend but we'll
get back to that shortly. I wanted to revise the summary mail so that
there is a record that can be easily digested without trawling through
archives. As before if I missed something important, prioritised poorly
or emphasised incorrectly then shout at me.

On testing of modern kernels


Josh Berkus claims that most people are using Postgres with 2.6.19 and
consequently there may be poor awareness of recent kernel developments.
This is a disturbingly large window of opportunity for problems to have
been introduced.

Minimally, Postgres has concerns about IO-related stalls which may or may
not exist in current kernels. There were indications that large writes
starve reads. There have been variants of this style of bug in the past but
it's unclear what the exact shape of this problem is and if IO-less dirty
throttling affected it. It is possible that Postgres was burned in the past
by data being written back from reclaim context in low memory situations.
That would have looked like massive stalls with drops in IO throughput
but it was fixed in relatively recent kernels. Any data on historical
tests would be helpful. Alternatively, a pgbench-based reproduction test
could potentially be used by people in the kernel community that track
performance over time and have access to a suitable testing rig.

Postgres bug reports and LKML
-

It is claimed that LKML does not welcome bug reports but it's less clear
what the basis of this claim is.  Is it because the reports are ignored? A
possible explanation is that they are simply getting lost in the LKML noise
and there would be better luck if the bug report was cc'd to a specific
subsystem list. A second possibility is the bug report is against an old
kernel and unless it is reproduced on a recent kernel the bug report will
be ignored. Finally it is possible that there is not enough data available
to debug the problem. The worst explanation is that to date the problem
has not been fixable but the details of this have been lost and are now
unknown. Is is possible that some of these bug reports can be refreshed
so at least there is a chance they get addressed?

Apparently there were changes to the reclaim algorithms that crippled
performance without any sysctls. The problem may be compounded by the
introduction of adaptive replacement cache in the shape of the thrash
detection patches currently being reviewed.  Postgres investigated the
use of ARC in the past and ultimately abandoned it. Details are in the
archives (http://www.Postgres.org/search/?m=1q=arcl=1d=-1s=r). I
have not read then, just noting they exist for future reference.

Sysctls to control VM behaviour are not popular as such tuning parameters
are often used as an excuse to not properly fix the problem. Would it be
possible to describe a test case that shows 2.6.19 performing well and a
modern kernel failing? That would give the VM people a concrete basis to
work from to either fix the problem or identify exactly what sysctls are
required to make this work.

I am confident that any bug related to VM reclaim in this area has been lost.
At least, I recall no instances of it being discussed on linux-mm and it
has not featured on LSF/MM during the last years.

IO Scheduling
-

Kevin Grittner has stated that it is known that the DEADLINE and NOOP
schedulers perform better than any alternatives for most database loads.
It would be desirable to quantify this for some test case and see can the
default scheduler cope in some way.

The deadline scheduler makes sense to a large extent though. Postgres
is sensitive to large latencies due to IO write spikes. It is at least
plausible that deadline would give more deterministic behaviour for
parallel reads in the presence of large writes assuming there were not
ordering problems between the reads/writes and the underlying filesystem.

For reference, these IO spikes can be massive. If the shared buffer is
completely dirtied in a short space of time then it could be 20-25% of
RAM being dirtied and writeback required in typical configurations. There
have been cases where it was worked around by limiting the size of the
shared buffer to a 

Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 11:26 AM, David Rowley wrote:
I've not been able to build master for a few days now on visual 
studios due some problems in the new test_shm_mq contrib module.
I'm not too familiar with how the visual studio project files are 
generated so I've not yet managed to come up with a fix just yet.


However I have attached a fix for the many compiler warnings 
that currawong has been seeing since 9c14dd2



That seems perfe4ctly sane, so I have committed it. I'll look into the 
other issue.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Filter error log statements by sqlstate

2014-01-17 Thread Tom Lane
Oskari Saarenmaa o...@ohmu.fi writes:
 17.01.2014 00:13, Tom Lane kirjoitti:
 I find it hard to follow exactly what the use-case for this is; could
 you make a defense of why we should carry a feature like this?

 I usually run PG with log_min_messages = warning and 
 log_min_error_statement = error to log any unexpected errors.  But as I 
 have a lot of check constraints in my database as well as a lot of 
 plpgsql and plpython code which raises exceptions on invalid user input 
 I also get tons of logs for statements causing expected errors which I 
 have to try to filter out with other tools.

But if the goal is to reduce clutter in your log, wouldn't you wish
to suppress the *entire* log entry for expected errors, not just the
SQL-statement field?  Certainly all the previous discussion about this
type of feature (and there has been plenty) has presumed that you'd want
to suppress whole entries.

 In any case, I'm finding it hard to believe that filtering by individual
 SQLSTATEs is a practical API.

 I don't know about others, but filtering by individual SQLSTATE is 
 exactly what I need - I don't want to suppress all plpgsql errors or 
 constraint violations, but I may want to suppress plpgsql RAISE 
 EXCEPTION and CHECK violations.

Meh.  Again, there's been lots of prior discussion, and I think there's
consensus that a filtering API based solely on a list of SQLSTATEs
wouldn't be widely adequate.  The most recent discussion I can find
about this is in this thread:

http://www.postgresql.org/message-id/flat/20131205204512.gd6...@eldon.alvh.no-ip.org#20131205204512.gd6...@eldon.alvh.no-ip.org

That thread references an older one

http://www.postgresql.org/message-id/flat/19791.1335902...@sss.pgh.pa.us#19791.1335902...@sss.pgh.pa.us

and I'm pretty sure that there are several others you could find with
a bit of digging.  The more ambitious proposals required decorating
ereport calls with a new kind of severity labeling, reflecting how
likely it'd be that DBAs would want to read about the particular
error in their logs.  That's be a lot of work though, and would require
us to make a lot of value judgements that might be wrong.  The main
alternative that was discussed was to filter on the basis of SQLSTATE
classes, and maybe relocate a few specific ERRCODEs to different classes
if it seemed that they were a lot unlike other things in their class.

It hasn't really been proven that SQLSTATE-class filtering would work
conveniently, but AFAICS the only way to prove or disprove that is for
somebody to code it up and use it in production.

What I'd suggest is that you revise this patch so that it can handle
filtering on the basis of either individual SQLSTATEs or whole classes,
the former being able to override the latter.  With a bit of wholesale
notation invention, an example could be

log_sqlstates = 'P0,!P0001,!42,42P05'

which would mean log all messages in class P0, except don't log P0001;
don't log anything in class 42, except always log 42P05; for everything
else, log according to log_min_messages.

If you don't like that notation, feel free to propose another.  I did
not like the one you had to start with, though, because it looked like
it was actually changing the error severity level, not just the log or
don't log decision.

BTW, I'd suggest that this filtering only happen for messages  PANIC
level; it's pretty hard to believe that anybody would ever want to
suppress a PANIC report.

Another thought here is that if you're willing to have the filter
only able to *prevent* logging, and not to let it *cause* logging
of messages that would otherwise be suppressed by log_min_messages,
it could be implemented as a loadable module using the emit_log_hook.
An experimental feature, which is what this really is, is always a lot
easier sell in that format since anybody who finds it useless needn't
pay the overhead (which I'm still concerned about ...).  But I'm not
sure how important it might be to be able to upgrade a message's log
priority, so maybe that approach would be significantly less usable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Andres Freund
Hi Mel,

On 2014-01-17 16:31:48 +, Mel Gorman wrote:
 Direct IO, buffered IO, double buffering and wishlists
 --
3. Hint that a page should be dropped immediately when IO completes.
   There is already something like this buried in the kernel internals
   and sometimes called immediate reclaim which comes into play when
   pages are bgin invalidated. It should just be a case of investigating
   if that is visible to userspace, if not why not and do it in a
   semi-sensible fashion.

bgin invalidated?

Generally, +1 on the capability to achieve such a behaviour from
userspace.

7. Allow userspace process to insert data into the kernel page cache
   without marking the page dirty. This would allow the application
   to request that the OS use the application copy of data as page
   cache if it does not have a copy already. The difficulty here
   is that the application has no way of knowing if something else
   has altered the underlying file in the meantime via something like
   direct IO. Granted, such activity has probably corrupted the database
   already but initial reactions are that this is not a safe interface
   and there are coherency concerns.

I was one of the people suggesting that capability in this thread (after
pondering about it on the back on my mind for quite some time), and I
first though it would never be acceptable for pretty much those
reasons.
But on second thought I don't think that line of argument makes too much
sense. If such an API would require write permissions on the file -
which it surely would - it wouldn't allow an application to do anything
it previously wasn't able to.
And I don't see the dangers of concurrent direct IO as anything
new. Right now the page's contents reside in userspace memory and aren't
synced in any way with either the page cache or the actual on disk
state. And afaik there are already several data races if a file is
modified and read both via the page cache and direct io.

The scheme that'd allow us is the following:
When postgres reads a data page, it will continue to first look up the
page in its shared buffers, if it's not there, it will perform a page
cache backed read, but instruct that read to immediately remove from the
page cache afterwards (new API or, posix_fadvise() or whatever). As long
as it's in shared_buffers, postgres will not need to issue new reads, so
there's no no benefit keeping it in the page cache.
If the page is dirtied, it will be written out normally telling the
kernel to forget about the caching the page (using 3) or possibly direct
io).
When a page in postgres's buffers (which wouldn't be set to very large
values) isn't needed anymore and *not* dirty, it will seed the kernel
page cache with the current data.

Now, such a scheme wouldn't likely be zero-copy, but it would avoid
double buffering. I think the cost of buffer copying has been overstated
in this thread... he major advantage is that all that could easily
implemented in a very localized manner, without hurting other OSs and it
could easily degrade on kernels not providing that capability, which
would surely be the majority of installations for the next couple of
cases.

So, I think such an interface would be hugely beneficial - and I'd be
surprised if other applications couldn't reuse it. And I don't think
it'd be all that hard to implement on the kernel side?

   Dave Chinner asked why, exactly, do you even need the kernel page
   cache here?  when Postgres already knows how and when data should
   be written back to disk. The answer boiled down to To let kernel do
   the job that it is good at, namely managing the write-back of dirty
   buffers to disk and to manage (possible) read-ahead pages. Postgres
   has some ordering requirements but it does not want to be responsible
   for all cache replacement and IO scheduling. Hannu Krosing summarised
   it best as

The other part is that using the page cache for the majority of warm,
but not burning hot pages, allows the kernel to much more sensibly adapt
to concurrent workloads requiring memory in some form or other (possibly
giving it to other VMs when mostly idle and such).

8. Allow copy-on-write of page-cache pages to anonymous. This would limit
   the double ram usage to some extent. It's not as simple as having a
   MAP_PRIVATE mapping of a file-backed page because presumably they want
   this data in a shared buffer shared between Postgres processes. The
   implementation details of something like this are hairy because it's
   mmap()-like but not mmap() as it does not have the same writeback
   semantics due to the write ordering requirements Postgres has for
   database integrity.

9. Hint that a page in an anonymous buffer is a copy of a page cache
page and invalidate the page cache page on COW. This limits the

Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Magnus Hagander
On Fri, Jan 17, 2014 at 5:51 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 01/17/2014 11:26 AM, David Rowley wrote:

 I've not been able to build master for a few days now on visual studios
 due some problems in the new test_shm_mq contrib module.
 I'm not too familiar with how the visual studio project files are
 generated so I've not yet managed to come up with a fix just yet.

 However I have attached a fix for the many compiler warnings that
 currawong has been seeing since 9c14dd2



 That seems perfe4ctly sane, so I have committed it. I'll look into the
 other issue.


Thanks.

I could've sworn I committed it that way, but I see now that it's sitting
unstaged in my working directory...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 11:51 AM, Andrew Dunstan wrote:


On 01/17/2014 11:26 AM, David Rowley wrote:
I've not been able to build master for a few days now on visual 
studios due some problems in the new test_shm_mq contrib module.
I'm not too familiar with how the visual studio project files are 
generated so I've not yet managed to come up with a fix just yet.


However I have attached a fix for the many compiler warnings that 
currawong has been seeing since 9c14dd2



That seems perfe4ctly sane, so I have committed it. I'll look into the 
other issue.





It looks like what we need to fix the immediate problem is to mark 
set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these 
symbols we might need to make visible?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 It looks like what we need to fix the immediate problem is to mark 
 set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these 
 symbols we might need to make visible?

We've generally relied on the buildfarm to cue us to add PGDLLIMPORT.
Add that one and see what happens ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor improvements to sslinfo contrib module

2014-01-17 Thread Gurjeet Singh
Please find attached the patch that fixes a couple of comments, and adds
'static' qualifier to functions that are not used anywhere else in the code
base.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com http://www.enterprisedb.com
diff --git a/contrib/sslinfo/sslinfo.c b/contrib/sslinfo/sslinfo.c
index 7a58470..18405f6 100644
--- a/contrib/sslinfo/sslinfo.c
+++ b/contrib/sslinfo/sslinfo.c
@@ -31,9 +31,10 @@ Datum		ssl_client_dn_field(PG_FUNCTION_ARGS);
 Datum		ssl_issuer_field(PG_FUNCTION_ARGS);
 Datum		ssl_client_dn(PG_FUNCTION_ARGS);
 Datum		ssl_issuer_dn(PG_FUNCTION_ARGS);
-Datum		X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
-Datum		X509_NAME_to_text(X509_NAME *name);
-Datum		ASN1_STRING_to_text(ASN1_STRING *str);
+
+static Datum	X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
+static Datum	X509_NAME_to_text(X509_NAME *name);
+static Datum	ASN1_STRING_to_text(ASN1_STRING *str);
 
 
 /*
@@ -51,7 +52,7 @@ ssl_is_used(PG_FUNCTION_ARGS)
 
 
 /*
- * Returns SSL cipher currently in use.
+ * Returns SSL version currently in use.
  */
 PG_FUNCTION_INFO_V1(ssl_version);
 Datum
@@ -77,7 +78,7 @@ ssl_cipher(PG_FUNCTION_ARGS)
 
 
 /*
- * Indicates whether current client have provided a certificate
+ * Indicates whether current client provided a certificate
  *
  * Function has no arguments.  Returns bool.  True if current session
  * is SSL session and client certificate is verified, otherwise false.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 12:43 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

It looks like what we need to fix the immediate problem is to mark
set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these
symbols we might need to make visible?

We've generally relied on the buildfarm to cue us to add PGDLLIMPORT.
Add that one and see what happens ...





OK, done.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 6:50 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 01/17/2014 12:43 PM, Tom Lane wrote:

 Andrew Dunstan and...@dunslane.net writes:

 It looks like what we need to fix the immediate problem is to mark
 set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these
 symbols we might need to make visible?

 We've generally relied on the buildfarm to cue us to add PGDLLIMPORT.
 Add that one and see what happens ...





 OK, done.


Great, that fixes it on my end.

Thanks for fixing those.

Regards
David Rowley



 cheers

 andrew




Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Mel Gorman
On Fri, Jan 17, 2014 at 06:14:37PM +0100, Andres Freund wrote:
 Hi Mel,
 
 On 2014-01-17 16:31:48 +, Mel Gorman wrote:
  Direct IO, buffered IO, double buffering and wishlists
  --
 3. Hint that a page should be dropped immediately when IO completes.
There is already something like this buried in the kernel internals
and sometimes called immediate reclaim which comes into play when
pages are bgin invalidated. It should just be a case of investigating
if that is visible to userspace, if not why not and do it in a
semi-sensible fashion.
 
 bgin invalidated?
 

s/bgin/being/

I admit that invalidated in this context is very vague and I did
not explain myself. This paragraph should remind anyone familiar with
VM internals about what happens when invalidate_mapping_pages calls
deactivate_page and how PageReclaim pages are treated by both page reclaim
and end_page_writeback handler. It's similar but not identical to what
Postgres wants and is a reasonable starting position for an implementation.

 Generally, +1 on the capability to achieve such a behaviour from
 userspace.
 
 7. Allow userspace process to insert data into the kernel page cache
without marking the page dirty. This would allow the application
to request that the OS use the application copy of data as page
cache if it does not have a copy already. The difficulty here
is that the application has no way of knowing if something else
has altered the underlying file in the meantime via something like
direct IO. Granted, such activity has probably corrupted the database
already but initial reactions are that this is not a safe interface
and there are coherency concerns.
 
 I was one of the people suggesting that capability in this thread (after
 pondering about it on the back on my mind for quite some time), and I
 first though it would never be acceptable for pretty much those
 reasons.
 But on second thought I don't think that line of argument makes too much
 sense. If such an API would require write permissions on the file -
 which it surely would - it wouldn't allow an application to do anything
 it previously wasn't able to.
 And I don't see the dangers of concurrent direct IO as anything
 new. Right now the page's contents reside in userspace memory and aren't
 synced in any way with either the page cache or the actual on disk
 state. And afaik there are already several data races if a file is
 modified and read both via the page cache and direct io.
 

All of this is true.  The objections may not hold up over time and it may
be seem much more reasonable when/if the easier stuff is addressed.

 The scheme that'd allow us is the following:
 When postgres reads a data page, it will continue to first look up the
 page in its shared buffers, if it's not there, it will perform a page
 cache backed read, but instruct that read to immediately remove from the
 page cache afterwards (new API or, posix_fadvise() or whatever).
 As long
 as it's in shared_buffers, postgres will not need to issue new reads, so
 there's no no benefit keeping it in the page cache.
 If the page is dirtied, it will be written out normally telling the
 kernel to forget about the caching the page (using 3) or possibly direct
 io).
 When a page in postgres's buffers (which wouldn't be set to very large
 values) isn't needed anymore and *not* dirty, it will seed the kernel
 page cache with the current data.
 

Ordinarily the initial read page could be discarded with fadvise but
the later write would cause the data to be read back in again which is a
waste. The details of avoiding that re-read are tricky from a core kernel
perspective because ordinarily the kernel at that point does not know if
the write is a full complete aligned write of an underlying filesystem
structure or not.  It may need a different write path which potentially
leads into needing changes to the address_space operations on a filesystem
basis -- that would get messy and be a Linux-specific extension. I have
not researched this properly at all, I could be way off but I have a
feeling the details get messy.

 Now, such a scheme wouldn't likely be zero-copy, but it would avoid
 double buffering.

It wouldn't be zero copy because minimally the data needs to be handed
over the filesystem for writing to the disk and the interface for that is
offset,length based, not page based. Maybe sometimes it will be zero copy
but it would be a filesystem-specific thing.

 I think the cost of buffer copying has been overstated
 in this thread... he major advantage is that all that could easily
 implemented in a very localized manner, without hurting other OSs and it
 could easily degrade on kernels not providing that capability, which
 would surely be the majority of installations for the next couple of
 cases.
 
 So, I think such an interface would be hugely 

Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Heikki Linnakangas

On 01/17/2014 01:05 PM, Alexander Korotkov wrote:

Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.


I'm looking into this now. A few quick notes:

* Even when you don't re-encode the whole page, you still WAL-log the 
whole page. While correct, it'd be a pretty obvious optimization to only 
WAL-log the modified part.


* When new items are appended to the end of the page so that only the 
last existing compressed segment is re-encoded, and the page has to be 
split, the items aren't divided 50/50 on the pages. The loop that moves 
segments destined for the left page to the right won't move any 
existing, untouched, segments.



!   /*
!* Didn't fit uncompressed. We'll have to encode them. Check if both
!* new items and uncompressed items can be placed starting from last
!* segment of page. Then re-encode only last segment of page.
!*/
!   minNewItem = newItems[0];
!   if (nolduncompressed == 0 
!   ginCompareItemPointers(olduncompressed[0], minNewItem) 
 0)
!   minNewItem = olduncompressed[0];


That looks wrong. If I'm understanding it right, it's trying to do 
minNewItem = Min(newItems[0], olduncompressed[0]). The test should be 
nolduncompressed  0  ...


No need to send a new patch, I'll just fix those while reviewing...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Alexander Korotkov
On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/17/2014 01:05 PM, Alexander Korotkov wrote:

 Seems to be fixed in the attached version of patch.
 Another improvement in this version: only left-most segments and further
 are re-encoded. Left part of page are left untouched.


 I'm looking into this now. A few quick notes:

 * Even when you don't re-encode the whole page, you still WAL-log the
 whole page. While correct, it'd be a pretty obvious optimization to only
 WAL-log the modified part.


Oh, I overlooked it. I wrote code in ginRedoInsertData which finds
appropriate place fro data but didn't notice that I still wrote whole page
to xlog record.


 * When new items are appended to the end of the page so that only the last
 existing compressed segment is re-encoded, and the page has to be split,
 the items aren't divided 50/50 on the pages. The loop that moves segments
 destined for the left page to the right won't move any existing, untouched,
 segments.


I think this loop will move one segment. However, it's too small.


  !   /*
 !* Didn't fit uncompressed. We'll have to encode them. Check if
 both
 !* new items and uncompressed items can be placed starting from
 last
 !* segment of page. Then re-encode only last segment of page.
 !*/
 !   minNewItem = newItems[0];
 !   if (nolduncompressed == 0 
 !   ginCompareItemPointers(olduncompressed[0],
 minNewItem)  0)
 !   minNewItem = olduncompressed[0];


 That looks wrong. If I'm understanding it right, it's trying to do
 minNewItem = Min(newItems[0], olduncompressed[0]). The test should be
 nolduncompressed  0  ...


Yes, definitely a bug.


 No need to send a new patch, I'll just fix those while reviewing...


Ok, thanks!

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 [ 0001-Add-support-for-printing-Size-arguments-to-elog-erep.patch ]

I think this approach is fundamentally broken, because it can't reasonably
cope with any case more complicated than %zu or %zd.  While it's
arguable that we can get along without the ability to specify field width,
since the [U]INT64_FORMAT macros never allowed that, it is surely going
to be the case that translators will need to insert n$ flags in the
translated versions of messages.

Another objection is that this only fixes the problem in elog/ereport
format strings, not anyplace else that it might be useful to print a
size_t value.  You suggest below that we could invent some additional
macros to support that; but since the z flag is in C99, there really
ought to be only a small minority of platforms where it doesn't work.
So I don't think we should be contorting our code with some nonstandard
notation for the case, if there's any way to avoid that.

I think a better solution approach is to teach our src/port/snprintf.c
about the z flag, then extend configure's checking to force use of our
snprintf if the platform's version doesn't handle z.  While it might be
objected that this creates a need for a run-time check in configure,
we already have one to check if snprintf handles n$, so this approach
doesn't really make life any worse for cross-compiles.

 In patch 01, I've modified configure to not define [U]INT64_FORMAT
 directly, but rather just define INT64_LENGTH_MODIFIER as
 appropriate. The formats are now defined in c.h.
 INT64_LENGTH_MODIFIER is defined without quotes - I am not sure whether
 that was the right choice, it requires using CppAsString2() in some
 places. Maybe I should just have defined it with quotes instead. Happy
 to rejigger it if somebody thinks the other way would be better.

Meh.  This isn't needed if we do what I suggest above, but in any case
I don't approve of removing the existing [U]INT64_FORMAT macros.
That breaks code that doesn't need to get broken, probably including
third-party modules.  It'd be more sensible to just add an additional
macro for the flag character(s).  (And yeah, I'd put quotes in it.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 12:55 PM, David Rowley wrote:
On Sat, Jan 18, 2014 at 6:50 AM, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net wrote:



On 01/17/2014 12:43 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net
mailto:and...@dunslane.net writes:

It looks like what we need to fix the immediate problem is
to mark
set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only
one of these
symbols we might need to make visible?

We've generally relied on the buildfarm to cue us to add
PGDLLIMPORT.
Add that one and see what happens ...




OK, done.


Great, that fixes it on my end.

Thanks for fixing those.



Not quite out of the woods yet. We're getting this regression failure on 
Windows/MSVC (see 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=bowerbirddt=2014-01-17%2018%3A20%3A35):



*** 
c:/prog/bf/root/HEAD/pgsql.2612/contrib/test_shm_mq/expected/test_shm_mq.out
Fri Jan 17 13:20:53 2014
--- c:/prog/bf/root/HEAD/pgsql.2612/contrib/test_shm_mq/results/test_shm_mq.out 
Fri Jan 17 13:44:33 2014
***
*** 5,18 
  -- internal sanity tests fail.
  --
  SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), '') 
from generate_series(1,400)), 1, 1);
!  test_shm_mq
! -
!
! (1 row)
!
  SELECT test_shm_mq_pipelined(16384, (select 
string_agg(chr(32+(random()*96)::int), '') from generate_series(1,27)), 
200, 3);
!  test_shm_mq_pipelined
! ---
!
! (1 row)
!
--- 5,10 
  -- internal sanity tests fail.
  --
  SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), '') 
from generate_series(1,400)), 1, 1);
! ERROR:  queue size must be at least 472262143 bytes
  SELECT test_shm_mq_pipelined(16384, (select 
string_agg(chr(32+(random()*96)::int), '') from generate_series(1,27)), 
200, 3);
! ERROR:  queue size must be at least 472262143 bytes




cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Josh Berkus
Mel,

So we have a few interested parties.  What do we need to do to set up
the Collab session?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Not quite out of the woods yet. We're getting this regression failure on 
 Windows/MSVC (see 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=bowerbirddt=2014-01-17%2018%3A20%3A35):

SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), 
 '') from generate_series(1,400)), 1, 1);
 ! ERROR:  queue size must be at least 472262143 bytes

It looks like this is computing a bogus value:

const Size shm_mq_minimum_size =
MAXALIGN(offsetof(shm_mq, mq_ring)) + MAXIMUM_ALIGNOF;

I seem to recall that we've previously found that you have to write

MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;

to keep MSVC happy with a reference to an array member in offsetof.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
I wrote:
 Meh.  This isn't needed if we do what I suggest above, but in any case
 I don't approve of removing the existing [U]INT64_FORMAT macros.
 That breaks code that doesn't need to get broken, probably including
 third-party modules.

After looking more closely I see you didn't actually *remove* those
macros, just define them in a different place/way.  So the above objection
is just noise, sorry.  (Though I think it'd be notationally cleaner to let
configure continue to define the macros; it doesn't need to do anything as
ugly as CppAsString2() to concatenate...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Robert Haas
On Fri, Jan 17, 2014 at 8:20 AM, Magnus Hagander mag...@hagander.net wrote:
 Robert Haas reported that setting it to 32MB can yield a considerable
 performance benefit:

 http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com

 In that case, sholdn't the autotuning be changed to not limit it at 16MB? :)

I'm in favor of keeping the setting; I think that the auto-tuning has
largely eliminated the pain in this area for the majority of users,
but that doesn't mean we should deny someone who really wants to
squeeze the last drop of performance out of their system the
opportunity to poke at it manually.  I doubt it's the least useful
setting we have.  The test above shows 32MB beating 16MB, but I think
I did other tests where 16MB and 64MB came out the same.

Back when I was working heavily on performance, I did a number of
tests to try to understand what events cause latency spikes.  Many of
those events are related to write-ahead logging.  It turns out that
writing a page from PostgreSQL's WAL buffers to the OS cache is
usually pretty fast, unless the OS thinks we're dirtying data too
quickly and decides to slam on the brakes.  Calling fsync() to get the
data out to disk, though, is very slow.  However, both of those
operations are protected by the same lock (WALWriteLock), so it's
frequently the case that no more WAL buffer space can be freed up by
calling write() because the guy who has the lock is busy waiting for
an fsync().  That sucks, because there's no intrinsic reason why we
can't have one backend doing a write() while another backend is doing
an fsync().  On a related note, there's no real reason why the poor
bastard who writes the WAL record that fills a segment should be
forced to synchronously flush the segment instead of letting it be
done in the background, but right now he is.

I think if we fix these problems, the optimal value for wal_buffers is
likely to change; however, I'm not certain we'll be able to to
auto-tune it perfectly on day one.  Having a setting makes it easier
for people to experiment with different values, and I think that's
good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal, patch: allow multiple plpgsql plugins

2014-01-17 Thread Pavel Stehule
2014/1/16 Marko Tiikkaja ma...@joh.to

 Hi Pavel,

 First of all, thanks for working on this!


 On 1/12/14, 8:58 PM, Pavel Stehule wrote:

 I still not happy with plugin_info - it is only per plugin now and should
 be per plugin and per function.


 I'm not sure I understand the point of plugin_info in the first place, but
 what would having a separate info per (plugin, function) achieve that can't
 be done otherwise?


First use case - I would to protect repeated call of plpgsql_check_function
in passive mode. Where I have to store information about successful first
start? It is related to the function instance, so function oid can be
ambiguous (for function with polymorphic parameters). When function
instance is destroyed, then this information should be destroyed. It is
impossible do this check from plugin. Second use case - attach session life
cycle plugin data with some function - for example for coverage
calculation. Inside plugin without function specific data you have to hold
a hash of all used function, and you have to search again and again. When
plpgsql hold this info in internal plpgsql function structures, then you
don't need search anything.




Regards

Pavel





 As for the current patch, I'd like to see improvements on a few things:

   1) It doesn't currently compile because of extra semicolons in the
  PLpgSQL_plugin struct.

   2) The previous comment above the same struct still talk about the
  rendezvous variable which is now gone.  The comment should be
  updated to reflect the new API.

   3) The same comment talks about how important it is to unregister a
  plugin if its _PG_fini() is ever called, but the current API does
  not support unregistering.  That should probably be added?  I'm not
  sure when _PG_fini() would be called.

   4) The comment  /* reserved for use by optional plugin */  seems a bit
  weird in its new context.


 Regards,
 Marko Tiikkaja



Re: [HACKERS] [v9.4] row level security

2014-01-17 Thread Gregory Smith

On 12/13/13 11:40 PM, Craig Ringer wrote:

You may want to check out the updated writable security-barrier views patch.

http://www.postgresql.org/message-id/52ab112b.6020...@2ndquadrant.com

It may offer a path forward for the CF submission for RLS, letting us
get rid of the var/attr fiddling that many here objected to.


With my advocacy hat on, I'd like to revisit this idea now that there's 
a viable updatable security barrier view submission.  I thought the most 
serious showstopper feedback from the last CF's RLS submission was that 
this needed to be sorted out first.  Reworking KaiGai's submission to 
merge against Dean's new one makes it viable again in my mind, and I'd 
like to continue toward re-reviewing it as part of this CF in that 
light.  Admittedly it's not ideal to try and do that at the same time 
the barrier view patch is being modified, but I see that as a normal CF 
merge of things based on other people's submissions.


I mentioned advocacy because the budding new PostgreSQL test instances 
I'm seeing now will lose a lot of momentum if we end up with no user 
visible RLS features in 9.4.  The pieces we have now can assemble into 
something that's useful, and I don't think that goal is unreasonably far 
away.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug f...@phlo.org wrote:


 I've now shuffled things around so that we can use inverse transition
 functions
 even if only some aggregates provide them, and to allow inverse transition
 functions to force a restart by returning NULL. The rules regarding NULL
 handling
 are now the following


Maybe this is me thinking out loud, but I'm just thinking about the numeric
case again.
Since the patch can now handle inverse transition functions returning NULL
when they fail to perform inverse transitions, I'm wondering if we could
add an expectedscale to NumericAggState, set it to -1 initially, when we
get the first value set the expectedscale to the dscale of that numeric,
then if we get anything but that value we'll set the expectedscale back to
-1 again, if we are asked to perform an inverse transition with a
expectedscale as -1 we'll return null, otherwise we can perform the inverse
transition...

Thoughts?

Regards

David Rowley


Re: [HACKERS] [BUGS] surprising to_timestamp behavior

2014-01-17 Thread Tom Lane
Jeevan Chalke jeevan.cha...@enterprisedb.com writes:
 Attached patch which fixes this issue.

 I have just tweaked the code around ignoring spaces in DCH_from_char()
 function.

 Adding to CommitFest 2014-01 (Open).

I went to review this, and found that there's not actually a patch
attached ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2014-01-17 Thread Robert Haas
On Wed, Jan 15, 2014 at 5:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 We already know that HOT is ineffective in areas of high contention
 (previous thread by me). Prior experience was that smaller tables
 didn't show much apparent benefit from using HOT either; its
 effectiveness was limited to medium and large tables being updated.
 The two already stated use cases that would apply are these ones

Do you have a link to that previous thread?  I don't happen to recall
that conversation.

I've found that HOT can be very important on smaller tables, so I'm
skeptical of that as a general conclusion.  What I think might be true
is that if VACUUM is going to hit the table often enough to make you
happy, then you don't really need HOT.  In other words, if the update
rate is non-zero but low, not too much cruft will accumulate before
the table gets vacuumed, and you may be OK.  If the update rate is
high, though, I think disabling HOT will be painful on a table of any
size.  There might be exceptions, but I can't think of what the are
off-hand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Alvaro Herrera
Tom Lane escribió:
 Andrew Dunstan and...@dunslane.net writes:
  Not quite out of the woods yet. We're getting this regression failure on 
  Windows/MSVC (see 
  http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=bowerbirddt=2014-01-17%2018%3A20%3A35):
 
 SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), 
  '') from generate_series(1,400)), 1, 1);
  ! ERROR:  queue size must be at least 472262143 bytes
 
 It looks like this is computing a bogus value:
 
 const Size shm_mq_minimum_size =
   MAXALIGN(offsetof(shm_mq, mq_ring)) + MAXIMUM_ALIGNOF;
 
 I seem to recall that we've previously found that you have to write
 
   MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;
 
 to keep MSVC happy with a reference to an array member in offsetof.

Hmm, this seems to contradict what's documented at the definition of
FLEXIBLE_ARRAY_MEMBER:

/* Define to nothing if C supports flexible array members, and to 1 if it does
   not. That way, with a declaration like `struct s { int n; double
   d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
   compilers. When computing the size of such an object, don't use 'sizeof
   (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
   instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
   MSVC and with C++ compilers. */
#define FLEXIBLE_ARRAY_MEMBER /**/


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 02:54 PM, Alvaro Herrera wrote:

Tom Lane escribió:

Andrew Dunstan and...@dunslane.net writes:

Not quite out of the woods yet. We're getting this regression failure on
Windows/MSVC (see
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=bowerbirddt=2014-01-17%2018%3A20%3A35):
SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), 
'') from generate_series(1,400)), 1, 1);
! ERROR:  queue size must be at least 472262143 bytes

It looks like this is computing a bogus value:

const Size shm_mq_minimum_size =
MAXALIGN(offsetof(shm_mq, mq_ring)) + MAXIMUM_ALIGNOF;

I seem to recall that we've previously found that you have to write

MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;

to keep MSVC happy with a reference to an array member in offsetof.

Hmm, this seems to contradict what's documented at the definition of
FLEXIBLE_ARRAY_MEMBER:

/* Define to nothing if C supports flexible array members, and to 1 if it does
not. That way, with a declaration like `struct s { int n; double
d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
compilers. When computing the size of such an object, don't use 'sizeof
(struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
MSVC and with C++ compilers. */
#define FLEXIBLE_ARRAY_MEMBER /**/





Well, there's a bunch of these in the source:

   [andrew@emma pg_head]$ grep -r offsetof.*\\[0 src/backend
   src/backend/access/nbtree/nbtutils.c:size = offsetof(BTVacInfo,
   vacuums[0]);
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +sizeof(path-p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:if (npts = 0 || npts = (int32)
   ((INT_MAX - offsetof(PATH, p[0])) / sizeof(Point)))
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +sizeof(path-p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly-p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:if (npts = 0 || npts = (int32)
   ((INT_MAX - offsetof(POLYGON, p[0])) / sizeof(Point)))
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly-p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +base_size;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly-p[0]) * path-npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly-p[0]) * 4;
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +sizeof(path-p[0]) * poly-npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +base_size;
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabstat, m_entry[0]) +
   src/backend/postmaster/pgstat.c:pgstat_send(msg,
   offsetof(PgStat_MsgFuncstat, m_entry[0]) +
   src/backend/postmaster/pgstat.c:pgstat_send(msg,
   offsetof(PgStat_MsgFuncstat, m_entry[0]) +
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabpurge, m_tableid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabpurge, m_tableid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgFuncpurge, m_functionid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgFuncpurge, m_functionid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabpurge, m_tableid[0]) +sizeof(Oid);

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane escribió:
 I seem to recall that we've previously found that you have to write
 MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;
 to keep MSVC happy with a reference to an array member in offsetof.

 Hmm, this seems to contradict what's documented at the definition of
 FLEXIBLE_ARRAY_MEMBER:

Ah, I thought we had that issue documented somewhere, but failed to find
this comment, or I'd have known that was backwards.

The other possibility I was contemplating is that export a const
variable doesn't actually work for some reason.  We're not in the habit
of doing that elsewhere, so I don't find that theory outlandish.  Perhaps
it could be fixed by adding PGDLLIMPORT to the extern, but on the whole
I'd rather avoid the technique altogether.

The least-unlike-other-Postgres-code approach would be to go ahead and
export the struct so that the size computation could be provided as a
#define in the same header.  Robert stated a couple days ago that he
didn't foresee much churn in this struct, so that doesn't seem
unacceptable.

Another possibility is to refactor so that testing an allocation request
against shm_mq_minimum_size is the responsibility of storage/ipc/shm_mq.c,
not some random code in a contrib module.  It's not immediately apparent
to me why it's good code modularization to have a contrib module
responsible for checking sizes based on the sizeof a struct it's not
supposed to have any access to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/17/2014 02:54 PM, Alvaro Herrera wrote:
 Hmm, this seems to contradict what's documented at the definition of
 FLEXIBLE_ARRAY_MEMBER:

 Well, there's a bunch of these in the source:

Yeah, I did the same grep and noted the same thing --- but on second look,
I think those are all structs that are defined without use of
FLEXIBLE_ARRAY_MEMBER, which OTOH *is* used in struct shm_mq.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Alvaro Herrera
Andrew Dunstan escribió:
 
 On 01/17/2014 02:54 PM, Alvaro Herrera wrote:

 Hmm, this seems to contradict what's documented at the definition of
 FLEXIBLE_ARRAY_MEMBER:

 Well, there's a bunch of these in the source:

AFAICT these are all defined with non-zero, non-empty array sizes, not
FLEXIBLE_ARRAY_MEMBER (which mq_ring is).  I don't know why that makes a
difference but apparently it does.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Gregory Smith

On 1/17/14 10:37 AM, Mel Gorman wrote:
There is not an easy way to tell. To be 100%, it would require an 
instrumentation patch or a systemtap script to detect when a 
particular page is being written back and track the context. There are 
approximations though. Monitor nr_dirty pages over time.


I have a benchmarking wrapper for the pgbench testing program called 
pgbench-tools:  https://github.com/gregs1104/pgbench-tools  As of 
October, on Linux it now plots the Dirty value from /proc/meminfo over 
time.  You get that on the same time axis as the transaction latency 
data.  The report at the end includes things like the maximum amount of 
dirty memory observed during the test sampling. That doesn't tell you 
exactly what's happening to the level someone reworking the kernel logic 
might want, but you can easily see things like the database's checkpoint 
cycle reflected by watching the dirty memory total.  This works really 
well for monitoring production servers too.  I have a lot of data from a 
plugin for the Munin monitoring system that plots the same way.  Once 
you have some history about what's normal, it's easy to see when systems 
fall behind in a way that's ruining writes, and the high water mark 
often correlates with bad responsiveness periods.


Another recent change is that pgbench for the upcoming PostgreSQL 9.4 
now allows you to specify a target transaction rate.  Seeing the write 
latency behavior with that in place is far more interesting than 
anything we were able to watch with pgbench before.  The pgbench write 
tests we've been doing for years mainly told you the throughput rate 
when all of the caches were always as full as the database could make 
them, and tuning for that is not very useful. Turns out it's far more 
interesting to run at 50% of what the storage is capable of, then watch 
what happens to latency when you adjust things like the dirty_* parameters.


I've been working on the problem of how we can make a benchmark test 
case that acts enough like real busy PostgreSQL servers that we can 
share it with kernel developers, and then everyone has an objective way 
to measure changes.  These rate limited tests are working much better 
for that than anything I came up with before.


I am skeptical that the database will take over very much of this work 
and perform better than the Linux kernel does.  My take is that our most 
useful role would be providing test cases kernel developers can add to a 
performance regression suite.  Ugly we never though that would happen 
situations seems at the root of many of the kernel performance 
regressions people here get nailed by.


Effective I/O scheduling is very hard, and we are unlikely to ever out 
innovate the kernel hacking community by pulling more of that into the 
database.  It's already possible to experiment with moving in that 
direction with tuning changes.  Use a larger database shared_buffers 
value, tweak checkpoints to spread I/O out, and reduce things like 
dirty_ratio.  I do some of that, but I've learned it's dangerous to 
wander too far that way.


If instead you let Linux do even more work--give it a lot of memory to 
manage and room to re-order I/O--that can work out quite well. For 
example, I've seen a lot of people try to keep latency down by using the 
deadline scheduler and very low settings for the expire times.  Theory 
is great, but it never works out in the real world for me though.  
Here's the sort of deadline I deploy instead now:


echo 500   ${DEV}/queue/iosched/read_expire
echo 30${DEV}/queue/iosched/write_expire
echo 1048576   ${DEV}/queue/iosched/writes_starved

These numbers look insane compared to the defaults, but I assure you 
they're from a server that's happily chugging through 5 to 10K 
transactions/second around the clock.  PostgreSQL forces writes out with 
fsync when they must go out, but this sort of tuning is basically giving 
up on it managing writes beyond that.  We really have no idea what order 
they should go out in.  I just let the kernel have a large pile of work 
queued up, and trust things like the kernel's block elevator and 
congestion code are smarter than the database can possibly be.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: implement gen_random_uuid

2014-01-17 Thread Tom Lane
Emre Hasegeli e...@hasegeli.com writes:
 2014/1/9 Oskari Saarenmaa o...@ohmu.fi:
 The only useful feature of the uuid-ossp module in my opinion is the
 uuid_generate_v4 function and as uuid-ossp is more or less abandonware
 people have had trouble building and installing it.  This patch implements
 an alternative uuid v4 generation function in pgcrypto which could be moved
 to core once there's a core PRNG with large enough internal state.

 It is a small but very useful patch. Installing uuid-ossp can be very hard
 on some systems. There is not much to review. The patch applies cleanly to
 HEAD. The function is generating valid UUID version 4. The code and
 the documentation style seems to fit in the pgcrypto extension. I am marking
 it as Ready for Commiter.

 The problem is users probably would not look pgcrypto extension for
 UUID generator, especially when there is another extension with uuid in
 it's name. Also, UUID generator does not sound like a cryptographic function.
 It would be much better, if this would be in core with the UUID type. There
 is a reference on the UUID Type documentation page to the uuid-ossp
 extension. We can add a reference to pgcrypro extension in that page and
 consider adding a deprecation note to the uuid-ossp extension, if is is not
 possible to add the function to the core, for now.

Well, we're not pulling pgcrypto into core in the foreseeable future;
there are legal (export control) issues that make that too risky.
Even aside from that, there was general consensus when type uuid went
in that the various generation algorithms were, how shall I say it, too
intellectually unsatisfying to be part of the core code.  So I think from
a code standpoint this solution is just fine.  I agree that we need some
extra work on the documentation to point people towards this approach
instead of uuid-ossp, though.  I'll take care of that and commit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
On Jan17, 2014, at 20:34 , David Rowley dgrowle...@gmail.com wrote:
 On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug f...@phlo.org wrote:
 
 I've now shuffled things around so that we can use inverse transition 
 functions
 even if only some aggregates provide them, and to allow inverse transition
 functions to force a restart by returning NULL. The rules regarding NULL 
 handling
 are now the following
 
 Maybe this is me thinking out loud, but I'm just thinking about the numeric 
 case again.
 Since the patch can now handle inverse transition functions returning NULL 
 when they
 fail to perform inverse transitions, I'm wondering if we could add an 
 expectedscale
 to NumericAggState, set it to -1 initially, when we get the first value set 
 the
 expectedscale to the dscale of that numeric, then if we get anything but that 
 value
 we'll set the expectedscale back to -1 again, if we are asked to perform an 
 inverse
 transition with a expectedscale as -1 we'll return null, otherwise we can 
 perform
 the inverse transition...

You could do better than that - the numeric problem amounts to tracking the 
maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale 
equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum 
scale
within the aggregated values isn't that bad. If we zero the dscale counters 
lazily,
the numbers of entries we have to zero is bound by the maximum dscale we 
encounter.
Since actually summing N digits is probably more expensive than zeroing them, 
and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.

It'd be interesting to do float() similar to the way you describe, though. We 
might
not be able to guarantee that we yield exactly the same result as without 
inverse
aggregation, but we might be able to bound the error. That might make it 
acceptable
to do this - as Kevin pointed out, float is always an approximation anyway. I 
haven't
really thought that through, though...

Anyway, with time running out fast if we want to get this into 9.4, I think we 
should
focus on getting this into a committable state right now.

I've started to look over what you've pushed to github, and it looks mostly 
fine.
I have a few comments - mostly cosmetic stuff - that I'll post once I finished 
reading
through it. I also plan to do some basic performance testing to verify that my
reshuffling of eval_windowaggregates() doesn't hurt aggregates without an 
inverse
transition function.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 03:15 PM, Tom Lane wrote:


The other possibility I was contemplating is that export a const
variable doesn't actually work for some reason.  We're not in the habit
of doing that elsewhere, so I don't find that theory outlandish.  Perhaps
it could be fixed by adding PGDLLIMPORT to the extern, but on the whole
I'd rather avoid the technique altogether.

The least-unlike-other-Postgres-code approach would be to go ahead and
export the struct so that the size computation could be provided as a
#define in the same header.  Robert stated a couple days ago that he
didn't foresee much churn in this struct, so that doesn't seem
unacceptable.

Another possibility is to refactor so that testing an allocation request
against shm_mq_minimum_size is the responsibility of storage/ipc/shm_mq.c,
not some random code in a contrib module.  It's not immediately apparent
to me why it's good code modularization to have a contrib module
responsible for checking sizes based on the sizeof a struct it's not
supposed to have any access to.





Or maybe we could expose the value via a function rather than a const 
variable.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 10:17 AM, Florian Pflug f...@phlo.org wrote:

 On Jan17, 2014, at 20:34 , David Rowley dgrowle...@gmail.com wrote:
  On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug f...@phlo.org wrote:
 
  I've now shuffled things around so that we can use inverse transition
 functions
  even if only some aggregates provide them, and to allow inverse
 transition
  functions to force a restart by returning NULL. The rules regarding
 NULL handling
  are now the following
 
  Maybe this is me thinking out loud, but I'm just thinking about the
 numeric case again.
  Since the patch can now handle inverse transition functions returning
 NULL when they
  fail to perform inverse transitions, I'm wondering if we could add an
 expectedscale
  to NumericAggState, set it to -1 initially, when we get the first value
 set the
  expectedscale to the dscale of that numeric, then if we get anything but
 that value
  we'll set the expectedscale back to -1 again, if we are asked to perform
 an inverse
  transition with a expectedscale as -1 we'll return null, otherwise we
 can perform
  the inverse transition...

 You could do better than that - the numeric problem amounts to tracking
 the maximum
 scale AFAICS, so it'd be sufficient to restart if we remove a value whose
 scale equals
 the current maximum. But if we do SUM(numeric) at all, I think we should
 do so
 without requiring restarts - I still think the overhead of tracking the
 maximum scale
 within the aggregated values isn't that bad. If we zero the dscale
 counters lazily,
 the numbers of entries we have to zero is bound by the maximum dscale we
 encounter.
 Since actually summing N digits is probably more expensive than zeroing
 them, and since
 we pay the zeroing overhead only once per aggregation but save potentially
 many
 summations, I'm pretty sure we come out ahead by quite some margin.


We'll work that out, I don't think it will take very long to code up your
idea either.
I just thought that my idea was good enough and very cheap too, won't all
numerics that are actually stored in a column have the same scale anyway?
Is it not only been a problem because we've been testing with random
numeric literals the whole time?

The test turned out to become:
if (state-expectedScale == -1)
state-expectedScale = X.dscale;
else if (state-expectedScale != X.dscale)
state-expectedScale = -2;

In do_numeric_accum, then when we do the inverse I just check if
expectedScale  0 then return NULL.

I'm not set on it, and I'm willing to try the lazy zeroing of the scale
tracker array, but I'm just not quite sure what extra real use cases it
covers that the one above does not. Perhaps my way won't perform inverse
transitions if the query did sum(numericCol / 10) or so.

I'll be committing this to the github repo very soon, so we can hack away
at the scale tracking code once it's back in.

David Rowley


Re: [HACKERS] Triggers on foreign tables

2014-01-17 Thread Noah Misch
On Tue, Jan 07, 2014 at 12:11:28PM +0100, Ronan Dunklau wrote:
 Since the last discussion about it 
 (http://www.postgresql.org/message-id/cadyhksugp6ojb1pybtimop3s5fg_yokguto-7rbcltnvaj5...@mail.gmail.com),
  I
 finally managed to implement row triggers for foreign tables.

 For statement-level triggers, nothing has changed since the last patch.
 Statement-level triggers are just allowed in the command checking.
 
 For row-level triggers, it works as follows:
 
 - during rewrite phase, every attribute of the foreign table is duplicated as
 a resjunk entry if a trigger is defined on the relation. These entries are
 then used to store the old values for a tuple. There is room for improvement
 here: we could check if any trigger is in fact a row-level trigger

The need here is awfully similar to a need of INSTEAD OF triggers on views.
For them, we add a single wholerow resjunk TLE.  Is there a good reason to
do it differently for triggers on foreign tables?

 - for after triggers, the whole queuing mechanism is bypassed for foreign
 tables. This is IMO acceptable, since foreign tables cannot have constraints
 or constraints triggers, and thus have not need for deferrable execution. This
 design avoids the need for storing and retrieving/identifying remote tuples
 until the query or transaction end.

Whether an AFTER ROW trigger is deferred determines whether it runs at the end
of the firing query or at the end of the firing query's transaction.  In all
cases, every BEFORE ROW trigger of a given query fires before any AFTER ROW
trigger of the same query.  SQL requires that.  This proposal would give
foreign table AFTER ROW triggers a novel firing time; let's not do that.

I think the options going forward are either (a) design a way to queue foreign
table AFTER ROW triggers such that we can get the old and/or new rows at the
end of the query or (b) not support AFTER ROW triggers on foreign tables for
the time being.

It's not clear to me whether SQL/MED contemplates triggers on foreign tables.
Its drop basic column definition General Rules do mention the possibility of
a reference from a trigger column list.  On the other hand, I see nothing
overriding the fact that CREATE TRIGGER only targets base tables.  Is this
clearer to anyone else?  (This is a minor point, mainly bearing on the
Compatibility section of the CREATE TRIGGER documentation.)

 - the duplicated resjunk attributes are identified by being:
   - marked as resjunk in the targetlist
   - not being system or whole-row attributes (varno  0)
 
 There is still one small issue with the attached patch: modifications to the
 tuple performed by the foreign data wrapper (via the returned TupleTableSlot
 in ExecForeignUpdate and ExecForeignInsert hooks) are not visible to the AFTER
 trigger. This could be fixed by merging the planslot containing the resjunk
 columns with the returned slot before calling the trigger, but I'm not really
 sure how to safely perform that. Any advice ?

Currently, FDWs are permitted to skip returning columns not actually
referenced by any RETURNING clause.  I would change that part of the API
contract to require returning all columns when an AFTER ROW trigger is
involved.  You can't get around doing that by merging old column values,
because, among other reasons, an INSERT does not have those values at all.

On Tue, Jan 07, 2014 at 05:31:10PM +0100, Ronan Dunklau wrote:
 --- a/contrib/postgres_fdw/expected/postgres_fdw.out
 +++ b/contrib/postgres_fdw/expected/postgres_fdw.out

 +NOTICE:  TG_NAME: trig_row_after
 +NOTICE:  TG_WHEN: AFTER
 +NOTICE:  TG_LEVEL: ROW
 +NOTICE:  TG_OP: DELETE
 +NOTICE:  TG_RELID::regclass: rem1
 +NOTICE:  TG_RELNAME: rem1
 +NOTICE:  TG_TABLE_NAME: rem1
 +NOTICE:  TG_TABLE_SCHEMA: public
 +NOTICE:  TG_NARGS: 2
 +NOTICE:  TG_ARGV: [23, skidoo]
 +NOTICE:  OLD: (2,bye)
 +NOTICE:  TG_NAME: trig_row_before
 +NOTICE:  TG_WHEN: BEFORE
 +NOTICE:  TG_LEVEL: ROW
 +NOTICE:  TG_OP: DELETE
 +NOTICE:  TG_RELID::regclass: rem1
 +NOTICE:  TG_RELNAME: rem1
 +NOTICE:  TG_TABLE_NAME: rem1
 +NOTICE:  TG_TABLE_SCHEMA: public
 +NOTICE:  TG_NARGS: 2
 +NOTICE:  TG_ARGV: [23, skidoo]
 +NOTICE:  OLD: (11,bye remote)
 +NOTICE:  TG_NAME: trig_row_after
 +NOTICE:  TG_WHEN: AFTER
 +NOTICE:  TG_LEVEL: ROW
 +NOTICE:  TG_OP: DELETE
 +NOTICE:  TG_RELID::regclass: rem1
 +NOTICE:  TG_RELNAME: rem1
 +NOTICE:  TG_TABLE_NAME: rem1
 +NOTICE:  TG_TABLE_SCHEMA: public
 +NOTICE:  TG_NARGS: 2
 +NOTICE:  TG_ARGV: [23, skidoo]
 +NOTICE:  OLD: (11,bye remote)
 +insert into rem1 values(1,'insert');

Would you trim the verbosity a bit?  Maybe merge several of the TG_ fields
onto one line, and remove the low-importance ones.  Perhaps issue one line
like this in place of all the current TG_ lines:

  NOTICE:  trig_row_after(23, skidoo) AFTER ROW DELETE ON rem1

 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql
 +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
 @@ -384,3 +384,188 @@ insert into loc1(f2) values('bye');
  insert into rem1(f2) values('bye 

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 10:42 AM, David Rowley dgrowle...@gmail.com wrote:


 You could do better than that - the numeric problem amounts to tracking
 the maximum
 scale AFAICS, so it'd be sufficient to restart if we remove a value whose
 scale equals
 the current maximum. But if we do SUM(numeric) at all, I think we should
 do so
 without requiring restarts - I still think the overhead of tracking the
 maximum scale
 within the aggregated values isn't that bad. If we zero the dscale
 counters lazily,
 the numbers of entries we have to zero is bound by the maximum dscale we
 encounter.
 Since actually summing N digits is probably more expensive than zeroing
 them, and since
 we pay the zeroing overhead only once per aggregation but save
 potentially many
 summations, I'm pretty sure we come out ahead by quite some margin.


 We'll work that out, I don't think it will take very long to code up your
 idea either.
 I just thought that my idea was good enough and very cheap too, won't all
 numerics that are actually stored in a column have the same scale anyway?
 Is it not only been a problem because we've been testing with random
 numeric literals the whole time?

 The test turned out to become:
 if (state-expectedScale == -1)
 state-expectedScale = X.dscale;
  else if (state-expectedScale != X.dscale)
 state-expectedScale = -2;

 In do_numeric_accum, then when we do the inverse I just check if
 expectedScale  0 then return NULL.

 I'm not set on it, and I'm willing to try the lazy zeroing of the scale
 tracker array, but I'm just not quite sure what extra real use cases it
 covers that the one above does not. Perhaps my way won't perform inverse
 transitions if the query did sum(numericCol / 10) or so.

 I'll be committing this to the github repo very soon, so we can hack away
 at the scale tracking code once it's back in.


Ok, we're back up to 86 aggregate function / type combinations with inverse
transition functions.
I've commited my latest work up to github and here's a fresh patch which I
will need to do more tests on.

The test (below) that used to fail a few versions ago is back in there and
it's now passing.

SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING)
  FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);

In this case it won't use inverse transitions because the forward
transition function detects that the scale is not fixed.

I tested throwing some numerics into a table and I'm pretty happy with the
results.

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,2);
select sum(num) over(order by num rows between current row and unbounded
following) from num; -- 124ms
select sum(num / 10) over(order by num rows between current row and
unbounded following) from num; -- 254ms
select sum(num / 1) over(order by num rows between current row and
unbounded following) from num; -- 108156.917 ms

The divide by 1 case is slow because of that weird 20 trailing zero instead
of 16 when dividing a numeric by 1 and that causes the inverse transition
function to return NULL because the scale changes.

I've not tested an unpatched version yet to see how that divide by 1 query
performs on that but I'll get to that soon.

I'm thinking that the idea about detecting the numeric range with floating
point types and performing an inverse transition providing the range has
not gone beyond some defined danger zone is not material for this patch...
I think it would be not a great deal of work to code, but the testing
involved would probably make this patch not possible for 9.4

The latest version of the patch is attached.

Regards

David Rowley


inverse_transition_functions_v2.7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 I just thought that my idea was good enough and very cheap too, won't all
 numerics that are actually stored in a column have the same scale anyway?

No; unconstrained numeric columns (ie, if you just say numeric) don't
force their contents to any particular scale.  It might be that we don't
have to optimize for that case, since it's not in the SQL spec, but it
is definitely supported by PG.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Greg Stark
On Fri, Jan 17, 2014 at 9:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 The scheme that'd allow us is the following:
 When postgres reads a data page, it will continue to first look up the
 page in its shared buffers, if it's not there, it will perform a page
 cache backed read, but instruct that read to immediately remove from the
 page cache afterwards (new API or, posix_fadvise() or whatever). As long
 as it's in shared_buffers, postgres will not need to issue new reads, so
 there's no no benefit keeping it in the page cache.
 If the page is dirtied, it will be written out normally telling the
 kernel to forget about the caching the page (using 3) or possibly direct
 io).
 When a page in postgres's buffers (which wouldn't be set to very large
 values) isn't needed anymore and *not* dirty, it will seed the kernel
 page cache with the current data.

This seems like mostly an exact reimplementation of DIRECT_IO
semantics using posix_fadvise.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Andres Freund
Hi,

On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
 I think a better solution approach is to teach our src/port/snprintf.c
 about the z flag, then extend configure's checking to force use of our
 snprintf if the platform's version doesn't handle z.  While it might be
 objected that this creates a need for a run-time check in configure,
 we already have one to check if snprintf handles n$, so this approach
 doesn't really make life any worse for cross-compiles.

Hm. I had thought about that, but dismissed it because I thought people
would argue about it being too invasive...
If we're going there, we should just eliminate expand_fmt_string() from
elog.c and test for it in configure too, right?

 You suggest below that we could invent some additional
 macros to support that; but since the z flag is in C99, there really
 ought to be only a small minority of platforms where it doesn't work.

Well, maybe just a minority numberwise, but one of them being windows
surely makes it count in number of installations...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Andres Freund
On 2014-01-17 14:18:55 -0500, Tom Lane wrote:
 I wrote:
  Meh.  This isn't needed if we do what I suggest above, but in any case
  I don't approve of removing the existing [U]INT64_FORMAT macros.
  That breaks code that doesn't need to get broken, probably including
  third-party modules.
 
 After looking more closely I see you didn't actually *remove* those
 macros, just define them in a different place/way.  So the above objection
 is just noise, sorry.  (Though I think it'd be notationally cleaner to let
 configure continue to define the macros; it doesn't need to do anything as
 ugly as CppAsString2() to concatenate...)

I prefer having configure just define the lenght modifier since that
allows to define further macros containing formats. But I think defining
them as strings instead row literals as I had might make it a bit less ugly...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Andres Freund
On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  [ 0001-Add-support-for-printing-Size-arguments-to-elog-erep.patch ]
 
 I think this approach is fundamentally broken, because it can't reasonably
 cope with any case more complicated than %zu or %zd.  While it's
 arguable that we can get along without the ability to specify field width,
 since the [U]INT64_FORMAT macros never allowed that, it is surely going
 to be the case that translators will need to insert n$ flags in the
 translated versions of messages.

Am I just too tired, or am I not getting how INT64_FORMAT currently
allows the arguments to be used posititional?

Admittedly most places currently just cast down the value avoiding the
need for INT64_FORMAT in many places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Andres Freund
On 2014-01-17 16:18:49 -0800, Greg Stark wrote:
 On Fri, Jan 17, 2014 at 9:14 AM, Andres Freund and...@2ndquadrant.com wrote:
  The scheme that'd allow us is the following:
  When postgres reads a data page, it will continue to first look up the
  page in its shared buffers, if it's not there, it will perform a page
  cache backed read, but instruct that read to immediately remove from the
  page cache afterwards (new API or, posix_fadvise() or whatever). As long
  as it's in shared_buffers, postgres will not need to issue new reads, so
  there's no no benefit keeping it in the page cache.
  If the page is dirtied, it will be written out normally telling the
  kernel to forget about the caching the page (using 3) or possibly direct
  io).
  When a page in postgres's buffers (which wouldn't be set to very large
  values) isn't needed anymore and *not* dirty, it will seed the kernel
  page cache with the current data.
 
 This seems like mostly an exact reimplementation of DIRECT_IO
 semantics using posix_fadvise.

Not at all. The significant bits are a) the kernel uses the pagecache
for writeback of dirty data and more importantly b) uses the pagecache
to cache data not in postgres's s_b.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
 I think a better solution approach is to teach our src/port/snprintf.c
 about the z flag, then extend configure's checking to force use of our
 snprintf if the platform's version doesn't handle z.

 Hm. I had thought about that, but dismissed it because I thought people
 would argue about it being too invasive...

How so?  It'd be a lot less invasive than what we'd have to do to use
'z' flags the other way.

 If we're going there, we should just eliminate expand_fmt_string() from
 elog.c and test for it in configure too, right?

If you mean let's rely on glibc for %m, the answer is not bloody
likely.  See useful_strerror(), which is functionality we'd lose if
we short-circuit that.

 You suggest below that we could invent some additional
 macros to support that; but since the z flag is in C99, there really
 ought to be only a small minority of platforms where it doesn't work.

 Well, maybe just a minority numberwise, but one of them being windows
 surely makes it count in number of installations...

Agreed, but I believe we're already using src/port/snprintf.c on Windows
because of lack of %n$ support (which is also required by C99).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
First, I've go the feeling that I should somehow update the commitfest app,
but I don't really know in which way. Should I put myself in as a reviewer,
or as a second author? Or neither? Suggestions welcome...

On Jan17, 2014, at 23:34 , David Rowley dgrowle...@gmail.com wrote:
 The test turned out to become:
   if (state-expectedScale == -1)
   state-expectedScale = X.dscale;
   else if (state-expectedScale != X.dscale)
   state-expectedScale = -2;
 
 In do_numeric_accum, then when we do the inverse I just check if
 expectedScale  0 then return NULL.

Ok, so this will rescan if and only if the dscales of all inputs match.
I still that's overly pessimistic - we've only got a problem when we
removed the input with the largest dscale, no? So my suggestion would be

  state-maxDScale = MAX(X.dscale, state-maxDScale);

in do_numeric_accum, and in the inverse

  if (state-maxDScane == X.dscale)
return PG_RETURN_NULL;

I'd think that this avoids more restarts without about the same effort,
but I haven't tried this though, so maybe I'm missing something.

 I'm not set on it, and I'm willing to try the lazy zeroing of the scale
 tracker array, but I'm just not quite sure what extra real use cases it
 covers that the one above does not. Perhaps my way won't perform inverse
 transitions if the query did sum(numericCol / 10) or so.

Dunno how many people SUM over numerics with different dscales. Easily
possible that it's few enough to not be worth fussing over.

 create table num (num numeric(10,2) not null);
 insert into num (num) select * from generate_series(1,2);
 select sum(num) over(order by num rows between current row and unbounded 
 following) from num; -- 124ms
 select sum(num / 10) over(order by num rows between current row and unbounded 
 following) from num; -- 254ms
 select sum(num / 1) over(order by num rows between current row and unbounded 
 following) from num; -- 108156.917 ms
 
 The divide by 1 case is slow because of that weird 20 trailing zero
 instead of 16 when dividing a numeric by 1 and that causes the inverse
 transition function to return NULL because the scale changes.
 
 I've not tested an unpatched version yet to see how that divide by 1 query
 performs on that but I'll get to that soon.

So without the patch, all three queries should perform simiarly, i.e. take
about 10 seconds, right? If so, the improvement is fantastic!

 I'm thinking that the idea about detecting the numeric range with floating
 point types and performing an inverse transition providing the range has
 not gone beyond some defined danger zone is not material for this patch...
 I think it would be not a great deal of work to code, but the testing involved
 would probably make this patch not possible for 9.4

Yeah, I never imagined that this would happen for 9.4.

 The latest version of the patch is attached.

OK, there are a few more comments

* build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid,
  I think. I don't quite like that we construct that even for plain aggregates,
  and avoiding requires just an additional if.

* Don't we need to check for volatile function in the filter expression too?

* As it stands, I don't think intXand_inv and intXor_inv are worth it, since
  the case where they return non-NULL is awefully slim (only for inputs
  containing only 1 respectively only zeros). We should either track the number
  of zeros and ones per bit, which would allow us to always perform inverse
  transitions, or just drop them. 

* Quite a few of the inverse transition functions are marked strict, yet
  contain code to handle NULL inputs. You can just remove that code - the system
  makes sure that strict functions never receive NULL arguments. Affected are,
  AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv, 
  int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, 
int4_sum_inv,
  int8_sum_inv. Not sure that list is exhaustive...

* For any of the new inverse transition functions, I'd be inclined to just
  elog() if they're called directly and not as an aggregate. In particular
  those which check for that anyway, plus the *smaller_inv and *larger_inv
  ones. I don't see why anyone would ever want to call these directly - and
  if we elog() now, we can easy change these functions later, because no 
external
  code can depend on them. E.g., maybe someone wants to keep the top N elements
  in the MIN/MAX aggregates one day...

* The number of new regression tests seems a bit excessive. I don't think there
  really a policy what to test and what not, but in this case I think it 
suffices
  if we test the basic machinery, plus the more complex functions. But e.g. most
  of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum 
internally,
  and the wrapper code basically just does datatype conversion, so testing a few
  cases seems enough there. What I think we *should* test, but don't do 
currently,
  is whether 

Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
 I think this approach is fundamentally broken, because it can't reasonably
 cope with any case more complicated than %zu or %zd.

 Am I just too tired, or am I not getting how INT64_FORMAT currently
 allows the arguments to be used posititional?

It doesn't, which is one of the reasons for not allowing it in
translatable strings (the other being lack of standardization of the
strings that would be subject to translation).  Adding 'z' will only
fix this for cases where what we want to print is really a size_t.
That's a usefully large set, but not all the cases by any means.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
I wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
 I think this approach is fundamentally broken, because it can't reasonably
 cope with any case more complicated than %zu or %zd.

 Am I just too tired, or am I not getting how INT64_FORMAT currently
 allows the arguments to be used posititional?

 It doesn't, which is one of the reasons for not allowing it in
 translatable strings (the other being lack of standardization of the
 strings that would be subject to translation).

On second thought, that answer was too glib.  There's no need for %n$
in the format strings *in the source code*, so INT64_FORMAT isn't getting
in the way from that perspective.  However, expand_fmt_string() is
necessarily applied to formats *after* they've been through gettext(),
so it has to expect that it might see %n$ in the now-translated strings.

It's still the case that we need a policy against INT64_FORMAT in
translatable strings, though, because what's passed to the gettext
mechanism has to be a fixed string not something with platform
dependencies in it.  Or so I would assume, anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor improvements to sslinfo contrib module

2014-01-17 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 Please find attached the patch that fixes a couple of comments, and adds
 'static' qualifier to functions that are not used anywhere else in the code
 base.

Committed, with the trivial fix that the function definitions have to be
marked static too.  (gcc doesn't complain about this, which I think may be
correct per C standard; but some other compilers do whine about that sort
of inconsistency.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-17 Thread Marko Tiikkaja

On 1/12/14, 5:53 AM, I wrote:

On 1/9/14, 2:57 PM, Dean Rasheed wrote:

How it should behave for multi-dimensional arrays is less clear, but
I'd argue that it should return the total number of elements, i.e.
cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it
consistent with the choices we've already made for unnest() and
ordinality:
   - cardinality(foo) = (select count(*) from unnest(foo)).
   - unnest with ordinality would always result in ordinals in the range
[1, cardinality].


Ignoring my proposal, this seems like the most reasonable option.  I'll
send an updated patch along these lines.


Here's the patch as promised.  Thoughts?


Regards,
Marko Tiikkaja
*** a/doc/src/sgml/array.sgml
--- b/doc/src/sgml/array.sgml
***
*** 338,343  SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 
'Carol';
--- 338,356 
  2
  (1 row)
  /programlisting
+ 
+  functioncardinality/function returns the total number of elements in an
+  array across all dimensions.  It is effectively the number of rows a call to
+  functionunnest/function would yield:
+ 
+ programlisting
+ SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';
+ 
+  cardinality 
+ -
+4
+ (1 row)
+ /programlisting
   /para
   /sect2
  
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 11009,11014  SELECT NULLIF(value, '(none)') ...
--- 11009,11017 
  primaryarray_upper/primary
/indexterm
indexterm
+ primarycardinality/primary
+   /indexterm
+   indexterm
  primarystring_to_array/primary
/indexterm
indexterm
***
*** 11167,11172  SELECT NULLIF(value, '(none)') ...
--- 11170,11186 
 row
  entry
   literal
+   functioncardinality/function(typeanyarray/type)
+  /literal
+ /entry
+ entrytypeint/type/entry
+ entryreturns the total number of elements in the array, or 0 if the 
array is empty/entry
+ entryliteralcardinality(ARRAY[[1,2],[3,4]])/literal/entry
+ entryliteral4/literal/entry
+/row
+row
+ entry
+  literal
functionstring_to_array/function(typetext/type, 
typetext/type optional, typetext/type/optional)
   /literal
  /entry
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***
*** 1740,1745  array_length(PG_FUNCTION_ARGS)
--- 1740,1766 
  }
  
  /*
+  * array_cardinality:
+  *returns the total number of elements in an array
+  */
+ Datum
+ array_cardinality(PG_FUNCTION_ARGS)
+ {
+   ArrayType  *v = PG_GETARG_ARRAYTYPE_P(0);
+   int*dimv;
+   int i;
+   int cardinality;
+ 
+   dimv = ARR_DIMS(v);
+   cardinality = 1;
+   for (i = 0; i  ARR_NDIM(v); i++)
+   cardinality *= dimv[i];
+ 
+   PG_RETURN_INT32(cardinality);
+ }
+ 
+ 
+ /*
   * array_ref :
   *  This routine takes an array pointer and a subscript array and returns
   *  the referenced item as a Datum.  Note that for a pass-by-reference
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 840,845  DATA(insert OID = 2092 (  array_upper PGNSP PGUID 12 1 0 0 
0 f f f f t f i 2
--- 840,847 
  DESCR(array upper dimension);
  DATA(insert OID = 2176 (  array_length   PGNSP PGUID 12 1 0 0 0 f f f 
f t f i 2 0 23 2277 23 _null_ _null_ _null_ _null_ array_length _null_ _null_ 
_null_ ));
  DESCR(array length);
+ DATA(insert OID = 3179 (  cardinalityPGNSP PGUID 12 1 0 0 0 f f f f t f i 
1 0 23 2277 _null_ _null_ _null_ _null_ array_cardinality _null_ _null_ 
_null_ ));
+ DESCR(array cardinality);
  DATA(insert OID = 378 (  array_appendPGNSP PGUID 12 1 0 0 0 f f f f f f i 
2 0 2277 2277 2283 _null_ _null_ _null_ _null_ array_push _null_ _null_ 
_null_ ));
  DESCR(append element onto end of array);
  DATA(insert OID = 379 (  array_prepend   PGNSP PGUID 12 1 0 0 0 f f f 
f f f i 2 0 2277 2283 2277 _null_ _null_ _null_ _null_ array_push _null_ 
_null_ _null_ ));
*** a/src/include/utils/array.h
--- b/src/include/utils/array.h
***
*** 204,209  extern Datum array_dims(PG_FUNCTION_ARGS);
--- 204,210 
  extern Datum array_lower(PG_FUNCTION_ARGS);
  extern Datum array_upper(PG_FUNCTION_ARGS);
  extern Datum array_length(PG_FUNCTION_ARGS);
+ extern Datum array_cardinality(PG_FUNCTION_ARGS);
  extern Datum array_larger(PG_FUNCTION_ARGS);
  extern Datum array_smaller(PG_FUNCTION_ARGS);
  extern Datum generate_subscripts(PG_FUNCTION_ARGS);
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
***
*** 1455,1460  select array_length(array[[1,2,3], [4,5,6]], 3);
--- 1455,1502 
   
  (1 row)
  
+ select cardinality(NULL::int[]);
+  cardinality 
+ -
+ 
+ (1 row)
+ 
+ select 

Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-17 Thread Tom Lane
Oskari Saarenmaa o...@ohmu.fi writes:
 On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
 - Why change the API of transformRelOptions()?

 The comment was changed to reflect the new API, I modified
 transformRelOptions to only accept a single valid namespace to make things
 simpler in the calling code.  Nothing used more than one valid namespace
 anyway, and it allows us to just use a constant toast without having to
 create a 2 char* array with a NULL.

That may be true today, but the code was obviously designed to allow for
more than one namespace in the future.  I'm inclined to reject this part
of the patch, or at least rework it to const-ify the existing data
structure rather than editorialize on what's needed.

However, I believe this code was Alvaro's to begin with, so I'd be
interested in his opinion on how important it is for transformRelOptions
to allow more than one namespace per call.

Actually, I'm kind of wondering why the code has a concept of namespaces
at all, given the fact that it fails to store them in the resulting array.
It seems impossible to verify after the fact that an option was given with
the right namespace, so isn't the feature pretty much pointless?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3.2 Client-only installation per docs fails creating directory.

2014-01-17 Thread Peter Eisentraut
On Thu, 2014-01-16 at 15:59 -0600, Mike Blackwell wrote:
 Per the docs (15.4 Installation Procedure, Client-only installation),
 after running make, the following should work:
 
 
 gmake -C src/bin install
 gmake -C src/include install
 gmake -C src/interfaces install
 gmake -C doc install

Fixed, thanks.

(It works if you have NLS enabled, which is why this was probably not
reported before.)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug f...@phlo.org wrote:

 First, I've go the feeling that I should somehow update the commitfest app,
 but I don't really know in which way. Should I put myself in as a reviewer,
 or as a second author? Or neither? Suggestions welcome...


We I guess you're both now, but it's a bit weird to be author and reviewer
so I've put your name against author too, hopefully Dean can review our
combined results and we can review each other's work at the same time.


 On Jan17, 2014, at 23:34 , David Rowley dgrowle...@gmail.com wrote:
  The test turned out to become:
if (state-expectedScale == -1)
state-expectedScale = X.dscale;
else if (state-expectedScale != X.dscale)
state-expectedScale = -2;
 
  In do_numeric_accum, then when we do the inverse I just check if
  expectedScale  0 then return NULL.

 Ok, so this will rescan if and only if the dscales of all inputs match.
 I still that's overly pessimistic - we've only got a problem when we
 removed the input with the largest dscale, no? So my suggestion would be

   state-maxDScale = MAX(X.dscale, state-maxDScale);

 in do_numeric_accum, and in the inverse

   if (state-maxDScane == X.dscale)
 return PG_RETURN_NULL;

 I'd think that this avoids more restarts without about the same effort,
 but I haven't tried this though, so maybe I'm missing something.


This is not quite right as it means if all the values are the same then
we reject inverse transitions since state-maxScale will always be equal to
X.dscale.
But you are right about the overly strict code I've put in, we should allow
values with a less than the maximum dscale to be unaggregated without
complaint. To implement this I needed a maxScaleCounter variable too so we
only reject when the maxScaleCounter gets back to 0 again.

Note that after this fix the results for my quick benchmark now look like:

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,2);
select sum(num) over(order by num rows between current row and unbounded
following) from num; -- 113 ms
select sum(num / 10) over(order by num rows between current row and
unbounded following) from num; -- 156ms
select sum(num / 1) over(order by num rows between current row and
unbounded following) from num; -- 144 ms

So it seems to be much less prone to falling back to brute force forward
transitions.
It also seems the / 10 version must have had to previously do 1 brute force
rescan but now it looks like it can do it in 1 scan.

 I'm not set on it, and I'm willing to try the lazy zeroing of the scale
  tracker array, but I'm just not quite sure what extra real use cases it
  covers that the one above does not. Perhaps my way won't perform inverse
  transitions if the query did sum(numericCol / 10) or so.

 Dunno how many people SUM over numerics with different dscales. Easily
 possible that it's few enough to not be worth fussing over.


Going by Tom's comments in the post above this is possible just by having
an unconstrained numeric column, but I guess there's still a good chance
that even those unconstrained numbers have the same scale or at least the
scale will likely not vary wildly enough to make us have to perform brute
force forward transitions for each row.


  create table num (num numeric(10,2) not null);
  insert into num (num) select * from generate_series(1,2);
  select sum(num) over(order by num rows between current row and unbounded
 following) from num; -- 124ms
  select sum(num / 10) over(order by num rows between current row and
 unbounded following) from num; -- 254ms
  select sum(num / 1) over(order by num rows between current row and
 unbounded following) from num; -- 108156.917 ms
 
  The divide by 1 case is slow because of that weird 20 trailing zero
  instead of 16 when dividing a numeric by 1 and that causes the inverse
  transition function to return NULL because the scale changes.
 
  I've not tested an unpatched version yet to see how that divide by 1
 query
  performs on that but I'll get to that soon.

 So without the patch, all three queries should perform simiarly, i.e. take
 about 10 seconds, right? If so, the improvement is fantastic!


Well, it's actually 100 seconds, not 10. I tested the worse case
performance against an unpatched head and got 107 seconds instead of the
108. So I'm guessing that's pretty good as worse case is not really any
worse and the worse case is pretty hard to get to. I guess the results
would have to all have a different scale with the biggest scale on the
first aggregated values... Reaching that worse case just seems impossible
in a real world workload.


  I'm thinking that the idea about detecting the numeric range with
 floating
  point types and performing an inverse transition providing the range has
  not gone beyond some defined danger zone is not material for this
 patch...
  I think it would be not a great deal of work to code, but 

Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-17 Thread Tom Lane
I wrote:
 However, I believe this code was Alvaro's to begin with, so I'd be
 interested in his opinion on how important it is for transformRelOptions
 to allow more than one namespace per call.

 Actually, I'm kind of wondering why the code has a concept of namespaces
 at all, given the fact that it fails to store them in the resulting array.
 It seems impossible to verify after the fact that an option was given with
 the right namespace, so isn't the feature pretty much pointless?

After thinking about it some more, I realize that the intended usage
pattern is to call transformRelOptions once for each allowed namespace.
Since each call would ignore options outside its namespace, that would
result in any options with invalid namespace names being silently ignored;
so the fix was to add a parameter saying which namespaces are valid,
and then each call checks that, independently of extracting the options
it cares about.

This design seems a bit odd to me; it's certainly wasting effort, since
each namespace check after the first one is redundant.  I'm inclined to
propose that we factor out the namespace validity checking into a separate
function, such that you do something like

void checkOptionNamespaces(List *defList, const char * const validnsps[])

just once, and then call transformRelOptions for each of the desired
namespaces; transformRelOptions's validnsps argument goes away.  In at
least some places this looks like it would net out cleaner; for instance,
there is no good reason why callers that are trying to extract toast
options should have to know which other namespaces are allowed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 6:15 PM, David Rowley dgrowle...@gmail.com wrote:

 On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug f...@phlo.org wrote:

 * Don't we need to check for volatile function in the filter expression
 too?



 I did manual testing on this before and the volatility test for the
 aggregate arguments seems to cover this. I didn't look into why but it just
 did. I've not test this again since your refactoring. I could test this
 easily before when my numeric case was changing the results because of the
 dscale problem, I noticed that if I did FILTER(WHERE random()  0) that the
 extra trailing zeros would disappear.
 The problem now is that it's pretty hard to determine if an inverse
 transition took place, the only way we can really tell is performance. I'll
 see if I can invent a new test case for this by creating a user defined
 aggregate as you described. I'm thinking just append '+' to a string for
 transitions and '-' to a string for inverse transitions, then just make
 sure we only have a string of '+'s when doing something like filter(where
 random() = 0).



I've added a test case for this and it seem work as expected:
https://github.com/david-rowley/postgres/commit/43a5021e8f8ae1af272e7e21a842d1b0d5cbe577

Regards

David Rowley


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Amit Kapila
On Sat, Jan 18, 2014 at 2:48 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/17/2014 03:15 PM, Tom Lane wrote:

 The other possibility I was contemplating is that export a const
 variable doesn't actually work for some reason.  We're not in the habit
 of doing that elsewhere, so I don't find that theory outlandish.  Perhaps
 it could be fixed by adding PGDLLIMPORT to the extern, but on the whole
 I'd rather avoid the technique altogether.

Is there any specific reason why adding PGDLLIMPORT for exporting
const variables is not good, when we are already doing for other variables
like InterruptHoldoffCount, CritSectionCount?

On searching in code, I found that for few const variables we do
extern PGDLLIMPORT. For example:
extern PGDLLIMPORT const int NumScanKeywords;
extern PGDLLIMPORT const char *debug_query_string;

 The least-unlike-other-Postgres-code approach would be to go ahead and
 export the struct so that the size computation could be provided as a
 #define in the same header.  Robert stated a couple days ago that he
 didn't foresee much churn in this struct, so that doesn't seem
 unacceptable.

 Another possibility is to refactor so that testing an allocation request
 against shm_mq_minimum_size is the responsibility of storage/ipc/shm_mq.c,
 not some random code in a contrib module.  It's not immediately apparent
 to me why it's good code modularization to have a contrib module
 responsible for checking sizes based on the sizeof a struct it's not
 supposed to have any access to.

 Or maybe we could expose the value via a function rather than a const
 variable.

All of above suggested alternatives will fix this problem. However after
fixing this problem, when I ran the test further it crashed the bgworker
and I found that reason was there are some other variables
(ImmediateInterruptOK, MyBgworkerEntry) used in test module without
PGDLLIMPORT.

After adding PGDLLIMPORT to variables (ImmediateInterruptOK,
MyBgworkerEntry, shm_mq_minimum_size) both the tests defined in
contrib module passed.


Attached patch fixes the problems related to test_shm_mq for me.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_test_shm_mq.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers