[HACKERS] [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)

2014-11-30 Thread Craig Ringer
Hi all

Currently the client must know the size of a large lob/clob field, like
a 'bytea' or 'text' field, in order to send it to the server. This can
force the client to buffer all the data before sending it to the server.

It would be helpful if the v4 protocol permitted the client to specify
the field length as unknown / TBD, then stream data until an end marker
is read. Some encoding would be required for binary data to ensure that
occurrences of the end marker in the streamed data were properly
handled, but there are many well established schemes for doing this.

I'm aware that this is possible for pg_largeobject, but this is with
reference to big varlena fields.

This would be a useful change to have in connection with the
already-TODO'd lazy fetching of large TOASTed values, as part of a
general improvement in Pg's handling of big values in tuples.

Thoughts/comments?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-11-30 Thread Dilip kumar
On 24 November 2014 11:29, Amit Kapila Wrote,

>I think still some of the comments given upthread are not handled:
>
>a.  About cancel handling

Your Actual comment was -->

>One other related point is that I think still cancel handling mechanism
>is not completely right, code is doing that when there are not enough
>number of freeslots, but otherwise it won't handle the cancel request,
>basically I am referring to below part of code:

run_parallel_vacuum()
{
..
for (cell = tables->head; cell; cell = cell->next)
{
..
free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
completedb);
…
PQsendQuery(slotconn, sql.data);

resetPQExpBuffer(&sql);
}



1.  I think only if connection is going for Slot wait, it will be in 
blocking, or while GetQueryResult, so we have Handle SetCancleRequest both 
places.

2.  Now a case (as you mentioned), when there are enough slots, and and 
above for loop is running if user do Ctrl+C then this will not break, This I 
have handled by checking inAbort

Mode inside the for loop before sending the new command, I think this we cannot 
do by setting the SetCancel because only when query receive some data it will 
realize that it canceled and it will fail, but until connection is not going to 
receive data it will not see the failure. So I have handled inAbort directly.



>b.  Setting of inAbort flag for case where PQCancel is successful

Your Actual comment was -->

>Yeah, user has tried to terminate, however utility will emit the
>message: "Could not send cancel request" in such a case and still
>silently tries to cancel and disconnect all connections.

You are right, I have fixed the code, now in case of failure we need not to set 
inAbort Flag..

>c.  Performance data of --analyze-in-stages switch



Performance Data
--
CPU 8 cores
RAM = 16GB
checkpoint_segments=256

Before each test, run the test.sql (attached)

Un-patched -
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005  
--analyze-in-stages  -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics

real0m0.843s
user0m0.000s
sys 0m0.000s

Patched
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005  
--analyze-in-stages  -j 2 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics

real0m0.593s
user0m0.004s
sys 0m0.004s

dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005  
--analyze-in-stages  -j 4 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics

real0m0.421s
user0m0.004s
sys 0m0.004s

I think in 2 connections we can get 30% improvement.

>d.  Have one pass over the comments in patch.  I could still some
>wrong multiline comments.  Refer below:
>+  /* Otherwise, we got a stage from vacuum_all_databases(), so run
>+   * only that one. */

Checked all, and fixed..

While testing, I found one more different behavior compare to base code,

Base Code:
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005 -t "t1" 
-t "t2" -t "t3" -t "t4" --analyze-in-stages  -d Postgres

Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics

real0m0.605s
user0m0.004s
sys 0m0.000s

I think it should be like,
SET default_statistics_target=1; do for all three tables
SET default_statistics_target=10; do for all three tables so on..

With Patched
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005 -t "t1" 
-t "t2" -t "t3" -t "t4" --analyze-in-stages -j 2 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics

real0m0.395s
user0m0.000s
sys 0m0.004s

here we are setting each target once and doing for all the tables..

Please provide you opinion.


Regards,
Dilip Kumar





test.sql
Description: test.sql


vacuumdb_parallel_v19.patch
Description: vacuumdb_parallel_v19.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Log inability to lock pages during vacuum

2014-11-30 Thread Jim Nasby

On 11/10/14, 7:52 PM, Tom Lane wrote:

On the whole, I'm +1 for just logging the events and seeing what we learn
that way.  That seems like an appropriate amount of effort for finding out
whether there is really an issue.


Attached is a patch that does this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
>From a8e824900d7c68e2c242b28c9c06c854f01b770a Mon Sep 17 00:00:00 2001
From: Jim Nasby 
Date: Sun, 30 Nov 2014 20:43:47 -0600
Subject: [PATCH] Log cleanup lock acquisition failures in vacuum

---

Notes:
Count how many times we fail to grab the page cleanup lock on the first try,
logging it with different wording depending on whether scan_all is true.

 doc/src/sgml/ref/vacuum.sgml  | 1 +
 src/backend/commands/vacuumlazy.c | 8 
 2 files changed, 9 insertions(+)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 450c94f..1272c1c 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -252,6 +252,7 @@ DETAIL:  CPU 0.01s/0.06u sec elapsed 0.07 sec.
 INFO:  "onek": found 3000 removable, 1000 nonremovable tuples in 143 pages
 DETAIL:  0 dead tuples cannot be removed yet.
 There were 0 unused item pointers.
+Could not acquire cleanup lock on 0 pages.
 0 pages are entirely empty.
 CPU 0.07s/0.39u sec elapsed 1.56 sec.
 INFO:  analyzing "public.onek"
diff --git a/src/backend/commands/vacuumlazy.c 
b/src/backend/commands/vacuumlazy.c
index 6db6c5c..8f22ed2 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -105,6 +105,8 @@ typedef struct LVRelStats
BlockNumber old_rel_pages;  /* previous value of pg_class.relpages 
*/
BlockNumber rel_pages;  /* total number of pages */
BlockNumber scanned_pages;  /* number of pages we examined */
+   /* number of pages we could not initially get lock on */
+   BlockNumber nolock;
double  scanned_tuples; /* counts only tuples on scanned pages 
*/
double  old_rel_tuples; /* previous value of pg_class.reltuples 
*/
double  new_rel_tuples; /* new estimated total # of tuples */
@@ -346,6 +348,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
ereport(LOG,
(errmsg("automatic vacuum of table 
\"%s.%s.%s\": index scans: %d\n"
"pages: %d removed, %d 
remain\n"
+   "%s cleanup lock on %u 
pages.\n"
"tuples: %.0f removed, 
%.0f remain, %.0f are dead but not yet removable\n"
"buffer usage: %d hits, 
%d misses, %d dirtied\n"
  "avg read rate: %.3f MB/s, avg write 
rate: %.3f MB/s\n"
@@ -356,6 +359,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,

vacrelstats->num_index_scans,

vacrelstats->pages_removed,
vacrelstats->rel_pages,
+   scan_all ? "Waited for" 
: "Could not acquire", vacrelstats->nolock,

vacrelstats->tuples_deleted,

vacrelstats->new_rel_tuples,

vacrelstats->new_dead_tuples,
@@ -611,6 +615,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
/* We need buffer cleanup lock so that we can prune HOT chains. 
*/
if (!ConditionalLockBufferForCleanup(buf))
{
+   vacrelstats->nolock++;
+
/*
 * If we're not scanning the whole relation to guard 
against XID
 * wraparound, it's OK to skip vacuuming a page.  The 
next vacuum
@@ -1101,10 +1107,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
vacrelstats->scanned_pages, nblocks),
 errdetail("%.0f dead row versions cannot be removed 
yet.\n"
   "There were %.0f unused item 
pointers.\n"
+  "%s cleanup lock on %u pages.\n"
   "%u pages are entirely empty.\n"
   "%s.",
   nkeep,
   nunused,
+  scan_all ? "Waited for" : "Could not 
acquire", vacrelstats->nolock,
   empty_pages,
   pg

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-11-30 Thread Peter Geoghegan
On Mon, Nov 17, 2014 at 11:39 PM, Jeff Davis  wrote:
> I can also just move isReset there, and keep mem_allocated as a uint64.
> That way, if I find later that I want to track the aggregated value for
> the child contexts as well, I can split it into two uint32s. I'll hold
> off any any such optimizations until I see some numbers from HashAgg
> though.

I took a quick look at memory-accounting-v8.patch.

Is there some reason why mem_allocated is a uint64? All other things
being equal, I'd follow the example of tuplesort.c's
MemoryContextAllocHuge() API, which (following bugfix commit
79e0f87a1) uses int64 variables to track available memory and so on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm not happy with test module move

2014-11-30 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> All of the MSVC critters are failing at "make check".

> Yeah, I noticed that, thanks.  As far as I can see the only way to fix
> it is to install dummy_seclabel to run the core seclabel test.  That
> doesn't seem smart; I think it'd be better to remove that part of the
> core seclabel test, and move the rest of the test to a new test in the
> dummy_seclabel module.

Sounds good to me.  The other parts of the core tests that depend on
contrib modules aren't exactly good models to follow.

> That was my fault -- the alvh.no-ip.org domain was deleted, and the
> email system in postgresql.org rejected the commit message because the
> sender was not in a deliverable domain.  I noticed within a few hours,
> but the damage was already done.

I guess I'm confused about which is your preferred email address?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-30 Thread Peter Geoghegan
On Tue, Nov 25, 2014 at 4:13 PM, Peter Geoghegan  wrote:
> If I don't hear anything in the next day or two,
> I'll more or less preserve aliases-related aspects of the patch.

FYI, I didn't go ahead and work on this, because I thought that the
thanksgiving holiday in the US probably kept you from giving feedback.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-11-30 Thread Tom Lane
Emre Hasegeli  writes:
>> Thanks. Overall, my impression of this patch is that it works very
>> well. But damned if I understood *how* it works :-). There's a lot
>> of statistics involved, and it's not easy to see why something is
>> multiplied by something else. I'm adding comments as I read through
>> it.

> Thank you for looking at it.  I tried to add more comments to
> the multiplications.  New version attached.  It also fixes a bug
> caused by wrong operator order used on histogram to histogram
> selectivity estimation for inner join.

I spent a fair chunk of the weekend hacking on this patch to make
it more understandable and fix up a lot of what seemed to me pretty
clear arithmetic errors in the "upper layers" of the patch.  However,
I couldn't quite convince myself to commit it, because the business
around estimation for partial histogram-bucket matches still doesn't
make any sense to me.  Specifically this:

/* Partial bucket match. */
left_divider = inet_hist_match_divider(left, query, opr_codenum);
right_divider = inet_hist_match_divider(right, query, opr_codenum);

if (left_divider >= 0 || right_divider >= 0)
match += 1.0 / pow(2.0, Max(left_divider, right_divider));

Now unless I'm missing something pretty basic about the divider
function, it returns larger numbers for inputs that are "further away"
from each other (ie, have more not-in-common significant address bits).
So the above calculation seems exactly backwards to me: if one endpoint
of a bucket is "close" to the query, or even an exact match, and the
other endpoint is further away, we completely ignore the close/exact
match and assign a bucket match fraction based only on the further-away
endpoint.  Isn't that exactly backwards?

I experimented with logic like this:

if (left_divider >= 0 && right_divider >= 0)
match += 1.0 / pow(2.0, Min(left_divider, right_divider));
else if (left_divider >= 0 || right_divider >= 0)
match += 1.0 / pow(2.0, Max(left_divider, right_divider));

ie, consider the closer endpoint if both are valid.  But that didn't seem
to work a whole lot better.  I think really we need to consider both
endpoints not just one to the exclusion of the other.

I'm also not exactly convinced by the divider function itself,
specifically about the decision to fail and return -1 if the masklen
comparison comes out wrong.  This effectively causes the masklen to be
the most significant part of the value (after the IP family), which seems
totally wrong.  ISTM we ought to consider the number of leading bits in
common as the primary indicator of "how far apart" a query and a
histogram endpoint are.

Even if the above aspects of the code are really completely right, the
comments fail to explain why.  I spent a lot of time on the comments,
but so far as these points are concerned they still only explain what
is being done and not why it's a useful calculation to make.

Anyway, attached is my updated version of the patch.  (I did commit the
added #define in pg_operator.h, so that the patch can be independent of
that file in future.)  I've marked this "waiting on author" in the CF app.

regards, tom lane

diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index d0d806f..f854847 100644
*** a/src/backend/utils/adt/network_selfuncs.c
--- b/src/backend/utils/adt/network_selfuncs.c
***
*** 3,9 
   * network_selfuncs.c
   *	  Functions for selectivity estimation of inet/cidr operators
   *
!  * Currently these are just stubs, but we hope to do better soon.
   *
   * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
--- 3,11 
   * network_selfuncs.c
   *	  Functions for selectivity estimation of inet/cidr operators
   *
!  * This module provides estimators for the subnet inclusion and overlap
!  * operators.  Estimates are based on null fraction, most common values,
!  * and histogram of inet/cidr columns.
   *
   * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
***
*** 16,32 
   */
  #include "postgres.h"
  
  #include "utils/inet.h"
  
  
  Datum
  networksel(PG_FUNCTION_ARGS)
  {
! 	PG_RETURN_FLOAT8(0.001);
  }
  
  Datum
  networkjoinsel(PG_FUNCTION_ARGS)
  {
! 	PG_RETURN_FLOAT8(0.001);
  }
--- 18,949 
   */
  #include "postgres.h"
  
+ #include 
+ 
+ #include "access/htup_details.h"
+ #include "catalog/pg_operator.h"
+ #include "catalog/pg_statistic.h"
  #include "utils/inet.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"
+ 
+ 
+ /* Default selectivity for the inet overlap operator */
+ #define DEFAULT_OVERLAP_SEL 0.01
  
+ /* Default selectivity for the various inclusion operators */
+ #define DEFAULT_INCLUSI

Re: [HACKERS] Buildfarm not happy with test module move

2014-11-30 Thread Alvaro Herrera
Tom Lane wrote:
> All of the MSVC critters are failing at "make check".

Yeah, I noticed that, thanks.  As far as I can see the only way to fix
it is to install dummy_seclabel to run the core seclabel test.  That
doesn't seem smart; I think it'd be better to remove that part of the
core seclabel test, and move the rest of the test to a new test in the
dummy_seclabel module.


> (BTW, why has the commit message not come through on pgsql-committers?)

That was my fault -- the alvh.no-ip.org domain was deleted, and the
email system in postgresql.org rejected the commit message because the
sender was not in a deliverable domain.  I noticed within a few hours,
but the damage was already done.

-- 
Álvaro Herrerahttp://www.twitter.com/alvherre


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-11-30 Thread Peter Geoghegan
On Tue, Nov 25, 2014 at 4:01 AM, Robert Haas  wrote:
> There's a lot of stuff in this patch I'm still trying to digest

I spotted a bug in the most recent revision. Mea culpa.

I think that the new field Tuplesortstate.abbrevNext should be an
int64, not an int. The fact that Tuplesortstate.memtupcount is an int
is not reason enough to make abbrevNext an int -- after all, with the
patch applied tuplesort uses a doubling growth strategy in respect of
abbrevNext, whereas grow_memtuples() is very careful about integer
overflow when growing memtupcount. I suggest we follow the good
example of tuplesort_skiptuples() in making our "ntuples" variable
(Tuplesortstate.abbrevNext) an int64 instead. The alternative is to
add grow_memtuples()-style checks.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo/spacing fix for "29.1. Reliability"

2014-11-30 Thread Ian Barwick

This fixes a missing space here:

  http://www.postgresql.org/docs/devel/static/wal-reliability.html

(present in 9.3 onwards).

Regards


Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
new file mode 100644
index 6172a08..1254c03
*** a/doc/src/sgml/wal.sgml
--- b/doc/src/sgml/wal.sgml
***
*** 194,200 
  
   
Data pages are not currently checksummed by default, though full page images
!   recorded in WAL records will be protected;  seeinitdb
for details about enabling data page checksums.
   
--- 194,200 
  
   
Data pages are not currently checksummed by default, though full page images
!   recorded in WAL records will be protected; see initdb
for details about enabling data page checksums.
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BDR Consistency in Insert/Update pkey conflicts

2014-11-30 Thread Robert Berry
I'm curious to know what the expected behavior is for an insert / update
conflict on a primary key field.

The wiki suggested "divergence conflicts" would be logged and replication
would halt for the downstream master.  There is a case where the databases
"diverge" such that they are no longer consistent, and where no error is
logged and replication continues.

The test case is summarized as inserting a record on a first node with
pkey='x'.  After this record propagates, then insert a record with a
different node with pkey='y', and at the same time update the first node's
record to change the key to 'y'.  Depending on timing, the result is that
the first node will only have a single record, while the other nodes will
have an x and y record.

The full test case is at:
https://github.com/no0p/bdrlab/blob/master/bdrsuite/tests/conflicts/insert_update_conflict.rb

Is this considered a divergence conflict and what is the expected behavior
for this case?

Also is there is a published list of conditions where masters can become
inconsistent and that is expected behavior -- or should that never happen?

Best Regards,
Robert


Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2014-11-30 Thread Andrew Dunstan


On 11/30/2014 04:31 PM, Tom Lane wrote:

Andrew Dunstan  writes:

OK, here's the patch.

Can we make IsValidJsonNumber() take a "const char *"?  Also its
comment should specify that it doesn't require nul-terminated
input, if indeed it doesn't.  Otherwise +1.





Then I have to cast away the const-ness when I assign it inside the 
function. Constifying the whole API would be a rather larger project, I 
suspect, assuming it's possible.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2014-11-30 Thread Tom Lane
Andrew Dunstan  writes:
> OK, here's the patch.

Can we make IsValidJsonNumber() take a "const char *"?  Also its
comment should specify that it doesn't require nul-terminated
input, if indeed it doesn't.  Otherwise +1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2014-11-30 Thread Andrew Dunstan


On 11/30/2014 11:45 AM, Tom Lane wrote:

Andrew Dunstan  writes:

what do you want to do about this? In the back branches, exposing a
function like this would be an API change, wouldn't it? Perhaps there we
just need to pick up the 100 lines or so involved from json.c and copy
them into hstore_io.c, suitably modified. In the development branch I
thing adding the function to the API is the best way.

If we're going to do it by calling some newly-exposed function, I'd be
inclined to fix it the same way in the back branches.  Otherwise the
discrepancy between the branches is a big back-patching hazard.
(For instance, if we realize we need to fix a bug in the numeric-parsing
code, what are the odds that we remember to fix hstore's additional copy
in the back branches?)

The "API break" isn't a big issue imo.  The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs.  If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.





OK, here's the patch.

cheers

andrew
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 6ce3047..ee3d696 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -1240,7 +1240,6 @@ hstore_to_json_loose(PG_FUNCTION_ARGS)
 	int			count = HS_COUNT(in);
 	char	   *base = STRPTR(in);
 	HEntry	   *entries = ARRPTR(in);
-	bool		is_number;
 	StringInfoData tmp,
 dst;
 
@@ -1267,48 +1266,9 @@ hstore_to_json_loose(PG_FUNCTION_ARGS)
 			appendStringInfoString(&dst, "false");
 		else
 		{
-			is_number = false;
 			resetStringInfo(&tmp);
 			appendBinaryStringInfo(&tmp, HS_VAL(entries, base, i), HS_VALLEN(entries, i));
-
-			/*
-			 * don't treat something with a leading zero followed by another
-			 * digit as numeric - could be a zip code or similar
-			 */
-			if (tmp.len > 0 &&
-!(tmp.data[0] == '0' &&
-  isdigit((unsigned char) tmp.data[1])) &&
-strspn(tmp.data, "+-0123456789Ee.") == tmp.len)
-			{
-/*
- * might be a number. See if we can input it as a numeric
- * value. Ignore any actual parsed value.
- */
-char	   *endptr = "junk";
-long		lval;
-
-lval = strtol(tmp.data, &endptr, 10);
-(void) lval;
-if (*endptr == '\0')
-{
-	/*
-	 * strol man page says this means the whole string is
-	 * valid
-	 */
-	is_number = true;
-}
-else
-{
-	/* not an int - try a double */
-	double		dval;
-
-	dval = strtod(tmp.data, &endptr);
-	(void) dval;
-	if (*endptr == '\0')
-		is_number = true;
-}
-			}
-			if (is_number)
+			if (IsValidJsonNumber(tmp.data, tmp.len))
 appendBinaryStringInfo(&dst, tmp.data, tmp.len);
 			else
 escape_json(&dst, tmp.data);
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d2bf640..271a2a4 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -173,6 +173,31 @@ lex_expect(JsonParseContext ctx, JsonLexContext *lex, JsonTokenType token)
 	 (c) == '_' || \
 	 IS_HIGHBIT_SET(c))
 
+/* utility function to check if a string is a valid JSON number */
+extern bool
+IsValidJsonNumber(char * str, int len)
+{
+	bool		numeric_error;
+	JsonLexContext dummy_lex;
+
+
+	/* json_lex_number expects a leading - to have been eaten already */
+	if (*str == '-')
+	{
+		dummy_lex.input = str + 1;
+		dummy_lex.input_length = len - 1;
+	}
+	else
+	{
+		dummy_lex.input = str;
+		dummy_lex.input_length = len;
+	}
+
+	json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
+
+	return ! numeric_error;
+}
+
 /*
  * Input.
  */
@@ -1338,8 +1363,6 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 {
 	char	   *outputstr;
 	text	   *jsontext;
-	bool		numeric_error;
-	JsonLexContext dummy_lex;
 
 	/* callers are expected to ensure that null keys are not passed in */
 	Assert( ! (key_scalar && is_null));
@@ -1376,25 +1399,14 @@ datum_to_json(Datum val, bool is_null, StringInfo result,
 			break;
 		case JSONTYPE_NUMERIC:
 			outputstr = OidOutputFunctionCall(outfuncoid, val);
-			if (key_scalar)
-			{
-/* always quote keys */
-escape_json(result, outputstr);
-			}
+			/*
+			 * Don't call escape_json for a non-key if it's a valid JSON
+			 * number.
+			 */
+			if (!key_scalar && IsValidJsonNumber(outputstr, strlen(outputstr)))
+appendStringInfoString(result, outputstr);
 			else
-			{
-/*
- * Don't call escape_json for a non-key if it's a valid JSON
- * number.
- */
-dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
-dummy_lex.input_length = strlen(dummy_lex.input);
-json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
-if (!numeric_error)
-	appendStringInfoString(result, outputstr);
-else
-	escape_json(result, outputstr);
-			}
+escape_json(resu

Re: [HACKERS] Removing INNER JOINs

2014-11-30 Thread David Rowley
On 1 December 2014 at 06:51, Tom Lane  wrote:

> David Rowley  writes:
> > I see this is quite a fundamental change to how things currently work and
> > it could cause planning to take place during the execution of PREPAREd
> > statements, which might not impress people too much, but it would
> certainly
> > fix the weird anomalies that I'm currently facing by trimming the plan at
> > executor startup. e.g left over Sort nodes after a MergeJoin was removed.
>
> > It would be interesting to hear Tom's opinion on this.
>
> Another question is what effect this has on EXPLAIN; there's basically
> no way you can avoid lying to the user about what's going to happen at
> runtime.
>
>
One of us must be missing something here. As far as I see it, there are no
lies told, the EXPLAIN shows exactly the plan that will be executed. All of
the regression tests I've added rely on this.

Regards

David Rowley


Re: [HACKERS] TODO item: Accept aliases for values in ROW(...) constructor

2014-11-30 Thread Pavel Stehule
Hi Craig

Is there agreement on proposed syntax  ROW(x AS something, y AS
somethingelse) ?

I can start work on this topic this week.

Regards

Pavel

2014-11-25 2:33 GMT+01:00 Craig Ringer :
>
>
> > ROW(x AS something, y AS somethingelse)
>
> Apologies, it looks like Pavel already bought this up:
>
>
http://www.postgresql.org/message-id/cafj8prb1t1w6g0sppn-jetyzjpluuz_fxtnbme5okd3xxvf...@mail.gmail.com
>
> and I missed it.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing INNER JOINs

2014-11-30 Thread Tom Lane
David Rowley  writes:
> I see this is quite a fundamental change to how things currently work and
> it could cause planning to take place during the execution of PREPAREd
> statements, which might not impress people too much, but it would certainly
> fix the weird anomalies that I'm currently facing by trimming the plan at
> executor startup. e.g left over Sort nodes after a MergeJoin was removed.

> It would be interesting to hear Tom's opinion on this.

TBH I don't like this patch at all even in its current form, let alone
a form that's several times more invasive.  I do not think there is a
big enough use-case to justify such an ad-hoc and fundamentally different
way of doing things.  I think it's probably buggy as can be --- one thing
that definitely is a huge bug is that it modifies the plan tree in-place,
ignoring the rule that the plan tree is read-only to the executor.
Another question is what effect this has on EXPLAIN; there's basically
no way you can avoid lying to the user about what's going to happen at
runtime.

One idea you might think about to ameliorate those two objections is two
separate plan trees underneath an AlternativeSubPlan or similar kind of
node.

At a more macro level, there's the issue of how can the planner possibly
make intelligent decisions at other levels of the join tree when it
doesn't know the cost of this join.  For that matter there's nothing
particularly driving the planner to arrange the tree so that the
optimization is possible at all.

Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea.  I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2014-11-30 Thread Tom Lane
Andrew Dunstan  writes:
> what do you want to do about this? In the back branches, exposing a 
> function like this would be an API change, wouldn't it? Perhaps there we 
> just need to pick up the 100 lines or so involved from json.c and copy 
> them into hstore_io.c, suitably modified. In the development branch I 
> thing adding the function to the API is the best way.

If we're going to do it by calling some newly-exposed function, I'd be
inclined to fix it the same way in the back branches.  Otherwise the
discrepancy between the branches is a big back-patching hazard.
(For instance, if we realize we need to fix a bug in the numeric-parsing
code, what are the odds that we remember to fix hstore's additional copy
in the back branches?)

The "API break" isn't a big issue imo.  The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs.  If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2014-11-30 Thread Andrew Dunstan


On 11/26/2014 11:48 AM, Andrew Dunstan wrote:


On 11/26/2014 11:19 AM, Tom Lane wrote:

bo...@edookit.com writes:
The hstore_to_json_loose(hstore) produces an invalid JSON in the 
following

case:
SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
[]))
Output: {"name": 1.}
The actual output is indeed incorrect as JSON does not permit `1.` - 
it must

be a string.

Yeah.  The problem seems to be the ad-hoc (I'm being polite) code in
hstore_to_json_loose to decide whether a string should be treated as a
number.  It does much more work than it needs to, and fails to have any
tight connection to the JSON syntax rules for numbers.

Offhand, it seems like the nicest fix would be if the core json code
exposed a function that would say whether a string matches the JSON
number syntax.  Does that functionality already exist someplace,
or is it too deeply buried in the JSON parser guts?

regards, tom lane






In json.c we now check numbers like this:

   JsonLexContext dummy_lex;
   boolnumeric_error;
   ...
   dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
   dummy_lex.input_length = strlen(dummy_lex.input);
   json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);

numeric_error is true IFF outputstr is a legal json number.

Exposing a function to do this should be trivial.





Tom,

what do you want to do about this? In the back branches, exposing a 
function like this would be an API change, wouldn't it? Perhaps there we 
just need to pick up the 100 lines or so involved from json.c and copy 
them into hstore_io.c, suitably modified. In the development branch I 
thing adding the function to the API is the best way.


I don't mind doing the work once we agree what is to be done - in the 
development branch it will be trivial.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing INNER JOINs

2014-11-30 Thread David Rowley
On 30 November 2014 at 23:19, Mart Kelder  wrote:

>
> I think performance can be greatly improved if the planner is able to use
> information based on the current data. I think these patches are just two
> examples of where assumptions during planning are usefull. I think there
> are
> more possibilities for this kind of assumpions (for example unique
> constraints, empty tables).
>
> The problem here is that assumpions done during planning might not hold
> during execution. That is why you placed the final decision about removing
> a
> join in the executor.
>
> If a plan is made, you know under which assumptions are made in the final
> plan. In this case, the assumption is that a foreign key is still valid. In
> general, there are a lot more assumptions, such as the still existing of an
> index or the still existing of columns. There also are soft assumptions,
> assuming that the used statistics are still reasonable.
>
>
Hi Mart,

That's an interesting idea. Though I think it would be much harder to
decide if it's a good idea to go off and replan for things like empty
tables as that's not known at executor startup, and may only be discovered
99% of the way through the plan execution, in that case going off and
replanning and starting execution all over again might throw away too much
hard work.

It does seem like a good idea for things that could be known at executor
start-up, I guess this would likely include LEFT JOIN removals using
deferrable unique indexes... Currently these indexes are ignored by the
current join removal code as they mightn't be unique until the transaction
finishes.

I'm imagining this being implemented by passing the planner a set of flags
which are assumptions that the planner is allowed to make... During the
planner's work, if it generated a plan which required this assumption to be
met, then it could set this flag in the plan somewhere which would force
the executor to check this at executor init. If the executor found any
required flag's conditions to be not met, then the executor would request a
new plan passing all the original flags, minus the ones that the conditions
have been broken on.

I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it would certainly
fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.

It would be interesting to hear Tom's opinion on this.

Regards

David Rowley


Re: [HACKERS] Removing INNER JOINs

2014-11-30 Thread Mart Kelder
Hi David (and others),

David Rowley wrote:
> Hi,
> 
> Starting a new thread which continues on from
> http://www.postgresql.org/message-id/caaphdvoec8ygwoahvsri-84en2k0tnh6gpxp1k59y9juc1w...@mail.gmail.com
> 
> To give a brief summary for any new readers:
> 
> The attached patch allows for INNER JOINed relations to be removed from
> the plan, providing none of the columns are used for anything, and a
> foreign key exists which proves that a record must exist in the table
> being removed which matches the join condition:
> 
> I'm looking for a bit of feedback around the method I'm using to prune the
> redundant plan nodes out of the plan tree at executor startup.
> Particularly around not stripping the Sort nodes out from below a merge
> join, even if the sort order is no longer required due to the merge join
> node being removed. This potentially could leave the plan suboptimal when
> compared to a plan that the planner could generate when the removed
> relation was never asked for in the first place.

I did read this patch (and the previous patch about removing SEMI-joins) 
with great interest. I don't know the code well enough to say much about the 
patch itself, but I hope to have some usefull ideas about the the global 
process.

I think performance can be greatly improved if the planner is able to use 
information based on the current data. I think these patches are just two 
examples of where assumptions during planning are usefull. I think there are 
more possibilities for this kind of assumpions (for example unique 
constraints, empty tables).

> There are some more details around the reasons behind doing this weird
> executor startup plan pruning around here:
> 
> http://www.postgresql.org/message-id/20141006145957.ga20...@awork2.anarazel.de

The problem here is that assumpions done during planning might not hold 
during execution. That is why you placed the final decision about removing a 
join in the executor.

If a plan is made, you know under which assumptions are made in the final 
plan. In this case, the assumption is that a foreign key is still valid. In 
general, there are a lot more assumptions, such as the still existing of an 
index or the still existing of columns. There also are soft assumptions, 
assuming that the used statistics are still reasonable.

My suggestion is to check the assumptions at the start of executor. If they 
still hold, you can just execute the plan as it is.

If one or more assumptions doesn't hold, there are a couple of things you 
might do:
* Make a new plan. The plan is certain to match all conditions because at 
that time, a snapshot is already taken.
* Check the assumption. This can be a costly operation with no guarantee of 
success.
* Change the existing plan to not rely on the failed assumption.
* Use an already stored alternate plan (generate during the initial plan).

You currently change the plan in executer code. I suggest to go back to the 
planner if the assumpion doesn't hold. The planner can then decide to change 
the plan. The planner can also conclude to fully replan if there are reasons 
for it.

If the planner knows that it needs to replan if the assumption will not hold 
during execution, the cost of replanning multiplied by the chance of the 
assumption not holding during exeuction should be part of the decision to 
deliver a plan with an assumpion in the first place.

> There are also other cases such as MergeJoins performing btree index scans
> in order to obtain ordered results for a MergeJoin that would be better
> executed as a SeqScan when the MergeJoin can be removed.
> 
> Perhaps some costs could be adjusted at planning time when there's a
> possibility that joins could be removed at execution time, although I'm
> not quite sure about this as it risks generating a poor plan in the case
> when the joins cannot be removed.

Maybe this is a case where you are better off replanning if the assumption 
doesn't hold instead of changing the generated exeuction plan. In that case 
you can remove the join before the path is made.

> Comments are most welcome
> 
> Regards
> 
> David Rowley

Regards,

Mart



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers