Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-11-19 Thread Peter Geoghegan
On Sat, Nov 19, 2016 at 6:45 PM, Robert Haas  wrote:
>> What do you think about new argument with default vs. GUC? I guess
>> that the GUC might be a lot less of a foot-gun. We might even give it
>> a suitably scary name, to indicate that it will make the server PANIC.
>> (I gather that you don't care about other aspects of verbosity -- just
>> about the ability to make amcheck PANIC in the event of an invariant
>> violation without recompiling it.)
>
> Yikes.  I don't think I want to expose any kind of API that lets the
> user PANIC the server.  A value < ERROR sounds far more reasonable
> than a value > ERROR.

In general, I don't want to get into the business of reasoning about
how well we can limp along when there is a would-be error condition
within amcheck. Once "the impossible" has actually occurred, it's very
difficult to reason about what still works. Also, I actually agree
that making it possible for the tool to force a PANIC through a
user-visible interface is a bad idea.

Maybe we should just leave it as it is -- experts can recompile the
tool after modifying it to use an elevel that is != ERROR (the thing I
mention about elevel < ERROR is already documented in code comments).
If that breaks, they get to keep both halves.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-11-19 Thread Robert Haas
On Fri, Nov 18, 2016 at 6:51 PM, Peter Geoghegan  wrote:
> On Thu, Nov 17, 2016 at 12:04 PM, Peter Geoghegan  wrote:
>>> Hm, if we want that - and it doesn't seem like a bad idea - I think we
>>> should be make it available without recompiling.
>>
>> I suppose, provided it doesn't let CORRUPTION elevel be < ERROR. That
>> might be broken if it was allowed.
>
> What do you think about new argument with default vs. GUC? I guess
> that the GUC might be a lot less of a foot-gun. We might even give it
> a suitably scary name, to indicate that it will make the server PANIC.
> (I gather that you don't care about other aspects of verbosity -- just
> about the ability to make amcheck PANIC in the event of an invariant
> violation without recompiling it.)

Yikes.  I don't think I want to expose any kind of API that lets the
user PANIC the server.  A value < ERROR sounds far more reasonable
than a value > ERROR.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Tom Lane
Robert Haas  writes:
> On Sat, Nov 19, 2016 at 12:31 PM, Tom Lane  wrote:
>> Thanks for the report!  Looks like the serialization code has overlooked
>> the fact that string-valued GUCs can be NULL.  Surprising we didn't
>> find that before ...

> Why do we allow this, anyway?

I think it simplifies some initialization cases.  Not sure how hard
it would be to remove that.  But this isn't the first such bug ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Robert Haas
On Sat, Nov 19, 2016 at 12:31 PM, Tom Lane  wrote:
> Andreas Seltenreich  writes:
>> sqlsmith just made a GUC that tricks the serialization code into
>> dereferencing a nullpointer.  Here's a recipe:
>
> Thanks for the report!  Looks like the serialization code has overlooked
> the fact that string-valued GUCs can be NULL.  Surprising we didn't
> find that before ...

I had no idea that was possible.  I think if we'd had a regression
test that set a GUC to such a value, I would have caught this in the
raft of bug fixes I submitted around December of last year, so maybe
we should add one.  Why do we allow this, anyway?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgpassfile connection option

2016-11-19 Thread Stephen Frost
All,

* Robert Haas (robertmh...@gmail.com) wrote:
> You could do something like that, I guess, but I think it might be a
> good idea to wait and see if anyone else has opinions on (1) the
> desirability of the basic feature, (2) the severity of the security
> hazard it creates, and (3) your proposed remediation method.
[...]
> Hey, everybody: chime in here...

The feature strikes me as pretty reasonable to have and the pghoard
example shows that it can be quite handy in some circumstances.  I don't
see much merit behind the security concern raised- the file in question
would have to have the correct format and you would have to be
connecting to a system listed in that file for any disclosure to happen,
no?  As such, I don't know that any remediation is necessary for this.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-19 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > In reviewing this patch, I also noticed that it's set up to assume a
> > 'true' result when a variable can't be parsed by ParseVariableBool.
> 
> I suppose that's meant to be backwards-compatible with the current
> behavior:

Ah, good point, however..

> but I agree that if we're changing things in this area, that would
> be high on my list of things to change.  I think what we want going
> forward is to disallow setting "special" variables to invalid values,
> and that should hold both for regular variables that have special
> behaviors, and for the special-syntax cases like \timing.

I completely agree with you here.  We shouldn't be assuming "invalid"
means "true".

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-19 Thread Tom Lane
Stephen Frost  writes:
> In reviewing this patch, I also noticed that it's set up to assume a
> 'true' result when a variable can't be parsed by ParseVariableBool.

I suppose that's meant to be backwards-compatible with the current
behavior:

regression=# \timing foo
unrecognized value "foo" for "\timing"; assuming "on"
Timing is on.

but I agree that if we're changing things in this area, that would
be high on my list of things to change.  I think what we want going
forward is to disallow setting "special" variables to invalid values,
and that should hold both for regular variables that have special
behaviors, and for the special-syntax cases like \timing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-19 Thread Stephen Frost
Daniel,

* Daniel Verite (dan...@manitou-mail.org) wrote:
> I'm attaching v3 of the patch with error reporting for
> FETCH_COUNT as mentioned upthread, and rebased
> on the most recent master.

Just fyi, there seems to be some issues with this patch because setting
my PROMPT1 and PROMPT2 variables result in rather odd behavior.

Here's what I use:

-
\set PROMPT1 '\033[33;1m%M(from '`hostname`').%/.%n.%> [%`date`]\033[0m\n=%x%# '
\set PROMPT2 '-%x%# '
-

In reviewing this patch, I also noticed that it's set up to assume a
'true' result when a variable can't be parsed by ParseVariableBool.

---
postgres=# \timing off
Timing is off.
postgres=# \timing asdsa
unrecognized value "asdsa" for "\timing": boolean expected
Timing is on.
---

That certainly doesn't feel right.  I'm thinking that if we're going to
throw an error back to the user about a value being invalid then we
shouldn't change the current value.

My initial thought was that perhaps we should pass the current value to
ParseVariableBool() and let it return whatever the "right" answer is,
however, your use of ParseVariableBool() for enums that happen to accept
on/off means that won't really work.

Perhaps the right answer is to flip this around a bit and treat booleans
as a special case of enums and have a generic solution for enums.
Consider something like:

ParseVariableEnum(valid_enums, str_value, name, curr_value);

'valid_enums' would be a simple list of what the valid values are for a
given variable and their corresponding value, str_value the string the
user typed, name the name of the variable, and curr_value the current
value of the variable.

ParseVariableEnum() could then detect if the string passed in is valid
or not and report to the user if it's incorrect and leave the existing
value alone.  This could also generically handle the question of if the
string passed in is a unique prefix of a correct value by comparing it
to all of the valid values and seeing if there's a unique match or not.

Thoughts?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hash tables in dynamic shared memory

2016-11-19 Thread John Gorman
I reviewed the dht-v2.patch and found that it is in excellent shape.

Benchmarking shows that this performs somewhat faster than
dynahash which is surprising because it is doing DSA address
translations on the fly.

One area where this could run faster is to reduce the amount of
time when the hash is in the RESIZE_IN_PROGRESS state.

When a hash partition reaches 75% full a resize begins by allocating
an empty new hash bucket array with double the number of buckets.
This begins the RESIZE_IN_PROGRESS state where there is both an
old and a new hash bucket array.

During the RESIZE state all operations such as find, insert,
delete and iterate run slower due to having to look items up in
two hash bucket arrays.

Whenever a new item is inserted into the hash and the hash is in
the RESIZE state the current code takes the time to move one item
from the old hash partition to the new hash partition. During
insertion an exclusive lock is already held on the partition so
this is an efficient time to do the resize cleanup work.

However since we only clean up one item per insertion we do not
complete the cleanup and free the old hash bucket array until we
are due to start yet another resize operation.

So we are effectively always in the RESIZE state which slows down
operations and wastes some space.

Here are the timings for insert and find in nanoseconds on a
Macbook Pro. The insert_resize figure includes the resize work to
move one item from the old to the new hash bucket array.

insert_resize: 945.98
insert_clean:  450.39
find_resize:   348.90
find_clean:293.16

The attached dht-v2-resize-cleanup.patch patch applies on top of
the dht-v2.patch and speeds up the resize cleanup process so that
hashes spend most of their time in the clean state.

It does this by cleaning up more than one old item during
inserts. This patch cleans up three old items.

There is also the case where a hash receives all of its inserts
at the beginning and the rest of the work load is all finds. This
patch also cleans up two items for each find.

The downside of doing extra cleanup early is some additional
latency. Here are the experimental figures and the approximate
formulas for different numbers of cleanups for inserts. Note that
we cannot go lower than one cleanup per insert.

Resize work in inserts: 729.79 insert + 216.19 * cleanups
1  945.98
2 1178.13
3 1388.73
4 1617.04
5 1796.91

Here are the timings for different numbers of cleanups for finds.
Note that since we do not hold an exclusive partition lock on a find
there is the additional overhead of taking that lock.

Resize work in finds: 348.90 dirty_find + 233.45 lock + 275.78 * cleanups
0  348.90
1  872.88
2 1138.98
3 1448.94
4 1665.31
5 1975.91

The new effective latencies during the resize vs. the clean states.

#define DHT_INSERT_CLEANS 3
#define DHT_SEARCH_CLEANS 2

insert_resize: 1388.73  -- 3 cleaned
insert_clean:   450.39
find_resize:   1138.98  -- 2 cleaned
find_clean: 293.16

The overall performance will be faster due to not having to examine
more than one hash bucket array most of the time.

--

John Gorman
http://www.enterprisedb.com


dht-v2-resize-cleanup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bogus use of *-expansion in UPDATE

2016-11-19 Thread Tom Lane
You get an odd result if you write "something.*" in an UPDATE's
source expressions:

regression=# create table t1 (f1 int, f2 int, f3 int);
CREATE TABLE
regression=# update t1 set f1 = t1.*;
ERROR:  UPDATE target count mismatch --- internal error

The reason for that is that we use transformTargetList() to do
parse analysis of the source expressions, and it thinks it should
expand "t1.*" into "t1.f1, t1.f2, t1.f3" as we would do in a SELECT.
This seems nonsensical to me: there is no way that there can be
more than one source expression for a given target column.

It would make sense to do *-expansion in this context:

regression=# update t1 set (f1,f2,f3) = (t1.*);

since that's implicitly a row constructor and we're supposed to do
*-expansion in row constructors.  Currently it doesn't work:

ERROR:  number of columns does not match number of values
LINE 1: update t1 set (f1,f2,f3) = (t1.*);
   ^

which I think is a separate bug, or at least missing feature, in the
multi-target-column feature.  It would take a little work to fix
that, since the expansion would have to wait till parse analysis,
whereas currently we separate the column targets in gram.y.

In short, I think we should modify transformTargetList so that it
doesn't do *-expansion when processing a simple UPDATE SET expression.
I'm not sufficiently motivated to rewrite the code enough to fix the
second case right now, but the first case seems like a simple
bug fix.  I would expect

UPDATE tab SET composite_column = foo.*

to be capable of working, but right now it's busted.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-11-19 Thread Tom Lane
Stephen Frost  writes:
> Let's create a new data type for this which supports old and new types,
> add what casts make sense, and call it a day.  Changing the data type
> name out from under people is not helping anyone.

+1.  I do not think changing behavior for the existing type name is
going to be a net win.  If we'd been smart enough to make the type
varlena from the get-go, maybe we could get away with it, but there
is just way too much risk of trouble with a change in a fixed-length
type's on-the-wire representation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimizations - pushing filter before aggregation

2016-11-19 Thread Tom Lane
Douglas Doole  writes:
> For min, you should be able to pre-filter =, < , and <=. In all cases the
> pre-filter would be <=. For max it would be =, > , >= becoming >=.

Doesn't really seem worth the trouble to me, given that those are pretty
unselective filter conditions.  If you could push down an = then it might
be worth doing ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-11-19 Thread Stephen Frost
* Shay Rojansky (r...@roji.org) wrote:
> >
> > > As I said before, Npgsql for one loads data types by name, not by OID.
> >>> > So this would definitely cause breakage.
> >>>
> >>> Why would that cause breakage?
> >>
> >> Well, the first thing Npgsql does when it connects to a new database, is
> >> to query pg_type. The type names are used to associate entries with type
> >> handlers, avoiding the hard-coding of OIDs in code. So if the type name
> >> "macaddr" suddenly has a new meaning and its wire representation is
> >> different breakage will occur. It is possible to release new versions of
> >> Npgsql which will look at the PostgreSQL version and say "we know that in
> >> PostgreSQL < 10 macaddr means this, but in >= 10 it means that". But that
> >> doesn't seem like a good solution, plus old versions of Npgsql from before
> >> this change won't work.
> >
> > The new datatype that is going to replace the existing one works with both
> > 6 and 8 byte
> > MAC address and stores it a variable length format. This doesn't change
> > the wire format.
> > I don't see any problem with the existing applications. The new datatype
> > can recv and send
> > 8 byte MAC address also.
> 
> Apologies, I may have misunderstood. If the new type is 100%
> wire-compatible (recv/send) with the old fixed-length 6-byte type, then
> there's no issue whatsoever.

Uh, that certainly isn't correct, is it...?

The new data type is going to be able to send back 8-byte values to a
Npgsql driver that's not expecting to get back 8 byte macaddrs.  That
isn't going to work.

Further, I don't agree that we can simply rename the existing macaddr to
something else, even if we keep the same OID, that's going to confuse
the heck out of people who are doing pg_upgrade's 9.6 to 10 and then
they try to do migrations using their previous systems or even to
compare the output of pg_dump from one to the next.

Let's create a new data type for this which supports old and new types,
add what casts make sense, and call it a day.  Changing the data type
name out from under people is not helping anyone.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-19 Thread Karl O. Pinc
On Sat, 19 Nov 2016 18:59:49 +0100
Gilles Darold  wrote:

> Le 19/11/2016 à 16:22, Karl O. Pinc a écrit :
> > Hi Gilles,
> >
> > On Tue, 15 Nov 2016 15:15:52 -0600
> > "Karl O. Pinc"  wrote:
> >  
> >>> On Mon, 7 Nov 2016 23:29:28 +0100
> >>> Gilles Darold  wrote:
>    - Do not write current_logfiles when log_collector is activated
>  but log_destination doesn't contained stderr or csvlog. This was
>  creating an empty file that can confuse the user.   
> >> Whether to write current_logfiles only when there's a stderr or
> >> csvlog seems dependent up on whether the current_logfiles file is
> >> for human or program use.  If for humans, don't write it unless it
> >> has content. If for programs, why make the program always have to
> >> check to see if the file exists before reading it?  Failing to
> >> check is just one more cause of bugs.  
> > What are your thoughts on this?  I'm leaning toward current_logfiles
> > being for programs, not people.  So it should be present whenever
> > logging_collector is on.  
> 
> My though is that it is better to not have an empty file even if
> log_collector is on.

Thanks.   I wrote to be sure that you'd considered my thoughts.

I don't see a point in further discussion.  I may submit a separate
patch to the maintainers when we're ready to send them code so they can
see how the code looks with current_logfiles always in existence.

Further thoughts below.  No need to read them or respond.

> Programs can not be confused but human yes, if
> the file is present but empty, someone may want to check why this
> file is empty.

IMO if they want to know why it's empty they could read the
docs.

> Also having a file containing two lines with just the
> log format without path is worst for confusion than having an empty
> file.

I agree that it makes no sense to list log formats without paths.
 
> In other words, from a program point of view, to gather last log
> filename, existence of the file or not doesn't make any difference.
> This is the responsibility of the programer to cover all cases. In a
> non expert user point of view, there will always the question: why
> this file is empty? If the file is not present, the question will
> stay: how I to get current log filename?

As a non-expert looking at the default data_directory (etc.) almost
nothing makes sense.  I see the non-expert using the
pg_current_logfile() function from within Postgres.

I also see a lot of non-experts writing program to get log data and
then getting errors when the config changes and current_logfile goes
away.  Not having data in a opened file generally means a program
does nothing. an appropriate action when there are no log files
in the filesystem.  But not accounting for a non-existent file leads
to crashes.

Regards,


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Nov 19, 2016 at 9:31 AM, Tom Lane  wrote:
>> Thanks for the report!  Looks like the serialization code has overlooked
>> the fact that string-valued GUCs can be NULL.  Surprising we didn't
>> find that before ...

> I was half-way through it when you sent your email. It seems to me
> that we need to take care only of the case for PGC_STRING, per the
> attached.

Meh; that might be stopping the crash for you but you didn't consider what
to do in serialize_variable.  Some machines would dump core there instead,
and others would print "(null)" which wouldn't fit in the allocated space.

The bigger problem here is that set_config_option cannot be used to set
a string GUC's value to NULL --- it will treat value == NULL as a request
to set to the reset_val.  The best we can do is to convert the value
into an empty string, which isn't quite the same thing, though it may be
close enough (compare e45e990e4).

I'm also pretty unexcited about serializing float variables this way ---
assuming that sprintf("%.17g") will reconstruct doubles exactly is
just asking for trouble IMO.

So I can't escape the itchy feeling that this entire chunk of code
needs to be thrown away and rewritten differently.  But it'd take
some refactoring of the code under set_config_option to do it nicely,
which is more work than we probably want to put in now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG?] pg_event_trigger_ddl_commands() error with ALTER TEXT SEARCH CONFIGURATION

2016-11-19 Thread Michael Paquier
On Thu, Nov 17, 2016 at 1:17 PM, Alvaro Herrera
 wrote:
> It's a bug.  You're right that we need to handle the object class
> somewhere.  Perhaps I failed to realize that tsconfigs could get
> altered.

It seems to me that the thing to be careful of here is how a new
OBJECT_TSCONFIGURATIONMAP should use ObjectAddress. It does not seem
that complicated, but it needs some work.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Michael Paquier
On Sat, Nov 19, 2016 at 9:51 AM, Andreas Seltenreich  wrote:
> Michael Paquier writes:
>
>> [2. text/plain; fix-guc-string-eval.patch]
>
> I'm afraid taking care of the length computation is not sufficient.
> ISTM like it'll still try to serialize the NULL pointer later on in
> serialize_variable:
>
> ,[ guc.c:9108 ]
> | case PGC_STRING:
> | {
> |   struct config_string *conf = (struct config_string *) gconf;
> |   do_serialize(destptr, maxbytes, "%s", *conf->variable);
> `

Hm, yes. Using an empty string strikes as being the best match.
-- 
Michael
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3c695c1..95f36b8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8959,7 +8959,10 @@ estimate_variable_size(struct config_generic * gconf)
{
struct config_string *conf = (struct 
config_string *) gconf;
 
-   valsize = strlen(*conf->variable);
+   if (*conf->variable && **conf->variable)
+   valsize = strlen(*conf->variable);
+   else
+   valsize = 0;
}
break;
 
@@ -9109,7 +9112,9 @@ serialize_variable(char **destptr, Size *maxbytes,
{
struct config_string *conf = (struct 
config_string *) gconf;
 
-   do_serialize(destptr, maxbytes, "%s", 
*conf->variable);
+   do_serialize(destptr, maxbytes, "%s",
+*conf->variable && 
**conf->variable ?
+*conf->variable : "");
}
break;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimizations - pushing filter before aggregation

2016-11-19 Thread Douglas Doole
>
> In above case wondering if we could do this
>
> Min(a) = 2 is the condition, generate condition *"a <= 2"* and push it
> down as scan key. Since pushed down condition is lossy one for us ( it
> gives values < 2), finally do a recheck of *"Min(a) = 2"*.
>
> For Max(a) = 2 we can have *"a >=2"*,
>

After replying, I was thinking along these lines too. I can't see any
reason why it wouldn't work. The same would apply for HAVING clauses on
min/max aggregations as well.

For min, you should be able to pre-filter =, < , and <=. In all cases the
pre-filter would be <=. For max it would be =, > , >= becoming >=.

- Doug Doole
Salesforce


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-19 Thread Gilles Darold
Le 19/11/2016 à 16:22, Karl O. Pinc a écrit :
> Hi Gilles,
>
> On Tue, 15 Nov 2016 15:15:52 -0600
> "Karl O. Pinc"  wrote:
>
>>> On Mon, 7 Nov 2016 23:29:28 +0100
>>> Gilles Darold  wrote:  
   - Do not write current_logfiles when log_collector is activated
 but log_destination doesn't contained stderr or csvlog. This was
 creating an empty file that can confuse the user. 
>> Whether to write current_logfiles only when there's a stderr or csvlog
>> seems dependent up on whether the current_logfiles file is for human
>> or program use.  If for humans, don't write it unless it has content.
>> If for programs, why make the program always have to check to see
>> if the file exists before reading it?  Failing to check is just one
>> more cause of bugs.
> What are your thoughts on this?  I'm leaning toward current_logfiles
> being for programs, not people.  So it should be present whenever
> logging_collector is on.

My though is that it is better to not have an empty file even if
log_collector is on. Programs can not be confused but human yes, if the
file is present but empty, someone may want to check why this file is
empty. Also having a file containing two lines with just the log format
without path is worst for confusion than having an empty file.

In other words, from a program point of view, to gather last log
filename, existence of the file or not doesn't make any difference. This
is the responsibility of the programer to cover all cases. In a non
expert user point of view, there will always the question: why this file
is empty? If the file is not present, the question will stay: how I to
get current log filename?


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Andreas Seltenreich
Michael Paquier writes:

> [2. text/plain; fix-guc-string-eval.patch]

I'm afraid taking care of the length computation is not sufficient.
ISTM like it'll still try to serialize the NULL pointer later on in
serialize_variable:

,[ guc.c:9108 ]
| case PGC_STRING:
| {
|   struct config_string *conf = (struct config_string *) gconf;
|   do_serialize(destptr, maxbytes, "%s", *conf->variable);
`


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Michael Paquier
On Sat, Nov 19, 2016 at 9:31 AM, Tom Lane  wrote:
> Andreas Seltenreich  writes:
>> sqlsmith just made a GUC that tricks the serialization code into
>> dereferencing a nullpointer.  Here's a recipe:
>
> Thanks for the report!  Looks like the serialization code has overlooked
> the fact that string-valued GUCs can be NULL.  Surprising we didn't
> find that before ...

I was half-way through it when you sent your email. It seems to me
that we need to take care only of the case for PGC_STRING, per the
attached.
-- 
Michael
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 3c695c1..3316092 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8959,7 +8959,10 @@ estimate_variable_size(struct config_generic * gconf)
{
struct config_string *conf = (struct 
config_string *) gconf;
 
-   valsize = strlen(*conf->variable);
+   if (*conf->variable && **conf->variable)
+   valsize = strlen(*conf->variable);
+   else
+   valsize = 0;
}
break;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump -s -b

2016-11-19 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> IMO, blobs are data, so those behaviors are correct.  The docs should
> be fixed to match the code.

Works for me.

Thoughts on the attached?

Thanks!

Stephen
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
new file mode 100644
index 371a614..04b3124
*** a/doc/src/sgml/ref/pg_dump.sgml
--- b/doc/src/sgml/ref/pg_dump.sgml
*** PostgreSQL documentation
*** 138,145 
 
  Include large objects in the dump.  This is the default behavior
  except when --schema, --table, or
! --schema-only is specified, so the -b
! switch is only useful to add large objects to selective dumps.
 

   
--- 138,148 
 
  Include large objects in the dump.  This is the default behavior
  except when --schema, --table, or
! --schema-only is specified.  The -b
! switch is therefore only useful to add large objects to dumps
! where a specific schema or table has been requested.  Note that
! blobs are considered data and therefore will be included when
! --data-only is used, but not when --schema-only is.
 

   


signature.asc
Description: Digital signature


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Tom Lane
Andreas Seltenreich  writes:
> sqlsmith just made a GUC that tricks the serialization code into
> dereferencing a nullpointer.  Here's a recipe:

Thanks for the report!  Looks like the serialization code has overlooked
the fact that string-valued GUCs can be NULL.  Surprising we didn't
find that before ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Joshua Drake
My solution requires that everything have an issue. E.g., hackers becomes a
tracker.

Sincerely,
 Jd

On Nov 19, 2016 09:04, "Tom Lane"  wrote:

> "Joshua D. Drake"  writes:
> > I wonder if now is the time (again) to consider an issue tracker.
>
> That would make the problem at hand worse, not better, because you'd
> get nothing at all for cases that were too trivial to make an issue
> tracker entry for, or that the committer couldn't be bothered to go
> find in the issue tracker.  We don't even have very widespread adherence
> to the cite-a-message-thread convention yet (so far as I can tell,
> Andres and I are the only committers doing it at all).  Adding another
> step of bureaucracy to our processes is just going to fail miserably.
>
> regards, tom lane
>


Re: [HACKERS] pg_dump -s -b

2016-11-19 Thread Tom Lane
Stephen Frost  writes:
> To wit, our docs specifically say:
> 
> Include large objects in the dump. This is the default behavior except
> when --schema, --table, or --schema-only is specified, so the -b switch
> is only useful to add large objects to selective dumps.
> 

Clearly, this is an oversimplification.

> However, a 'pg_dump -s -b' won't include blobs, which appears to be due
> to the masking performed in _tocEntryRequired() whereby we strip the
> REQ_DATA out if we are in schema-only mode.
> Further, though perhaps less surprising, we don't include blobs when
> --section=post-data is set, and only the definition of the blob (but no
> data) when --section=pre-data is used.

IMO, blobs are data, so those behaviors are correct.  The docs should
be fixed to match the code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump -s -b

2016-11-19 Thread Stephen Frost
All,

While reviewing and considering the patch to add the --no-blobs option
to pg_dump, which I entirely agree with, I came across our claim that
passing -b to pg_dump would cause blobs to be output, regardless of
other options set.

To wit, our docs specifically say:


Include large objects in the dump. This is the default behavior except
when --schema, --table, or --schema-only is specified, so the -b switch
is only useful to add large objects to selective dumps.


However, a 'pg_dump -s -b' won't include blobs, which appears to be due
to the masking performed in _tocEntryRequired() whereby we strip the
REQ_DATA out if we are in schema-only mode.

Further, though perhaps less surprising, we don't include blobs when
--section=post-data is set, and only the definition of the blob (but no
data) when --section=pre-data is used.

This appears to go back at least as far as all currently supported
branches.

We have a few options here.  The documentation says 'selective dumps',
which we could infer to mean 'dumps where the user specified a table' or
similar and exclude 'schema-only' from that case.  I don't believe this
was the original intent, given this comment in pg_dump.c:

/*
 * Dumping blobs is now default unless we saw an inclusion switch or -s
 * ... but even if we did see one of these, -b turns it back on.
 */

Given the lack of field complaints about this, I'm pretty tempted to
just adjust that comment and the documentation.  Alternativly, we could
hack things up in a simimlar manner to what was done in a7e5457, and
cause blobs to be dumped even in schema-only mode if -b is passed.

Thoughts?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Tom Lane
"Joshua D. Drake"  writes:
> I wonder if now is the time (again) to consider an issue tracker.

That would make the problem at hand worse, not better, because you'd
get nothing at all for cases that were too trivial to make an issue
tracker entry for, or that the committer couldn't be bothered to go
find in the issue tracker.  We don't even have very widespread adherence
to the cite-a-message-thread convention yet (so far as I can tell,
Andres and I are the only committers doing it at all).  Adding another
step of bureaucracy to our processes is just going to fail miserably.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Tom Lane
Andrew Dunstan  writes:
> On 11/19/2016 10:11 AM, Stephen Frost wrote:
>> That's actually not the case if we use a hash, because the client could
>> figure out what the hash is before sending it.

> The fact that it could possibly be done is not a good reason for doing 
> it.

Quite.  I will *not* go along with any proposal that requires me to
duplicate some hashing logic being done elsewhere.  Way too much risk
of error there.

> Every proposal along these lines strikes me as making more unobvious 
> hoops for people to jump through. We're just reinventing the wheel here 
> because we don't like the color of the standard wheels that some email 
> MUAs produce.

Yeah, anything like this would be going very very far out of our way to
keep these lines in commit messages under an arbitrary limit (which we're
already exceeding anyway in many cases).  I'm inclined to think we should
just straight up decide that having the message link be a clickable URL is
worth the longer lines, or that it is not.  Personally I vote for "not",
but I recognize that I might be in the minority.

BTW, before we give up completely on reconciling the two objectives,
is anyone aware of a line-continuation convention that would be likely
to work?  If we could do something like

Discussion: https://www.postgresql.org/message-id/\
caf4au4w6rrh_j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com

it would improve matters all around.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Joshua D. Drake

On 11/17/2016 01:02 PM, Andrew Dunstan wrote:

I love seeing references to email threads in commit messages. It would
make them a lot friendlier though if a full http URL were included
instead of just a Message-ID, i.e. instead of  put
. I know this is
a bit more trouble. but not that much, and it would make it very easy to
follow with just a mouse click.


This thread as a whole is going on and on. Instead I thought I would 
reply back to the OP and see where that goes.


I too like to see references in commit messages to the email threads. It 
definitely makes it easier to track the collaboration of the commit.


I wonder if now is the time (again) to consider an issue tracker. Think 
about it! This whole thread would be solved with a commit message that said:


Allow replicated slaves to load balance plans to take load off the 
master per #18235 -- Robert Haas


Then all you have to do is hit #18235 into the issue tracker and viola!

I know our community historically has been against most things NIH but 
if we step back objectively --- using an issue tracker for this is 
certainly not any more crazy that implementing yet another new wheel.


Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change order sort of table in HashJoin

2016-11-19 Thread Tom Lane
Man Trieu  writes:
> As in the example below, i think the plan which hash table is created on
> testtbl2 (the fewer tuples) should be choosen.

The planner usually prefers to hash on the table that has a flatter
MCV histogram, since a hash table with many key collisions will be
inefficient.  You might find it illuminating to read the comments around
estimate_hash_bucketsize().

In general, given a hashtable that fits in memory and light bucket
loading, a hash join is more or less O(M) + O(N); it doesn't matter
so much whether the larger table is on the inside.  It does matter if
the table gets big enough to force batching of the join, but that's
not happening in your example (at least not the first one; it's unclear
to me why it did happen in the second one).  The key thing that will
drive the choice, then, is avoiding a skewed bucket distribution that
causes lots of comparisons for common values.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-19 Thread Karl O. Pinc
Hi Gilles,

On Tue, 15 Nov 2016 15:15:52 -0600
"Karl O. Pinc"  wrote:

> > On Mon, 7 Nov 2016 23:29:28 +0100
> > Gilles Darold  wrote:  

> > >   - Do not write current_logfiles when log_collector is activated
> > > but log_destination doesn't contained stderr or csvlog. This was
> > > creating an empty file that can confuse the user. 

> Whether to write current_logfiles only when there's a stderr or csvlog
> seems dependent up on whether the current_logfiles file is for human
> or program use.  If for humans, don't write it unless it has content.
> If for programs, why make the program always have to check to see
> if the file exists before reading it?  Failing to check is just one
> more cause of bugs.

What are your thoughts on this?  I'm leaning toward current_logfiles
being for programs, not people.  So it should be present whenever
logging_collector is on.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Andrew Dunstan



On 11/19/2016 10:11 AM, Stephen Frost wrote:

* Magnus Hagander (mag...@hagander.net) wrote:

On Nov 19, 2016 15:33, "Andrew Dunstan"  wrote:

I can think of at least one scenario where you might not easily have

access to any invented ID - you're the author and you have mailing list
me-too turned off, so you don't have anything that the mailing list has
inserted. I honestly think we should just stick with the Message-ID value
and get over the ugliness.

That's actually not the case if we use a hash, because the client could
figure out what the hash is before sending it.



The fact that it could possibly be done is not a good reason for doing 
it. Every proposal along these lines strikes me as making more unobvious 
hoops for people to jump through. We're just reinventing the wheel here 
because we don't like the color of the standard wheels that some email 
MUAs produce.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
> On Nov 19, 2016 15:33, "Andrew Dunstan"  wrote:
> > I can think of at least one scenario where you might not easily have
> access to any invented ID - you're the author and you have mailing list
> me-too turned off, so you don't have anything that the mailing list has
> inserted. I honestly think we should just stick with the Message-ID value
> and get over the ugliness.

That's actually not the case if we use a hash, because the client could
figure out what the hash is before sending it.

> > Putting the clickable URL in a header("X-Archive-Link" ?) is a good idea,
> though. Using my MUA (Thunderbird) that would make it as simple to get as
> View Source (Ctrl-U) and a simple C, which would make committers' lives
> easy. I bet others might appreciate it too.
> 
> As a thunderbird user you might be interested in something like
> https://github.com/mhagander/pgarchives.  I don't know if it works with
> modern versions of thunderbird, but hopefully it does. And would be more
> convenient than the view source way..

Particulalrly if there is an extension for it...

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Magnus Hagander
On Nov 19, 2016 15:33, "Andrew Dunstan"  wrote:
>
>
>
> On 11/19/2016 07:36 AM, Stephen Frost wrote:
>>
>> Magnus,
>>
>> * Magnus Hagander (mag...@hagander.net) wrote:
>>>
>>> On Sat, Nov 19, 2016 at 1:07 AM, Stephen Frost 
wrote:

 * Magnus Hagander (mag...@hagander.net) wrote:
>
> It would make the URLs actually short, but as mentioned upthread, that
> wouldn't work at all if offline. So it'd be a tradeoff between those,
but
> so are pretty much all other options that don't include the full

 message-id.

 This is a bit of a crazy idea, but in the new list system, couldn't we
 add a header which includes "our" surrogate message-id?  Or possibly
the
 entire URL to the message, and maybe the URL for the entire thread?
>>>
>>> I'd rather not tie those systems in that tightly. I think they are much
>>> better off being de-coupled.
>>
>> I get that, but...
>>
>>> That said, what we could do is invent our own "id". We could either use
a
>>> separate surrogate key, or we could do the sha-1 hash of the messageid.
And
>>> stick that in a header, which could then be searched for both locally
and
>>> remotely.
>>
>> Yeah, that's a good thought too.  I think we'd need to use a SHA1 to
>> avoid collisions which means that it'll be a bit longer than if we used
>> an actual ID, but it shouldn't be *too* long.
>>
>
>
> I can think of at least one scenario where you might not easily have
access to any invented ID - you're the author and you have mailing list
me-too turned off, so you don't have anything that the mailing list has
inserted. I honestly think we should just stick with the Message-ID value
and get over the ugliness.
>
> Putting the clickable URL in a header("X-Archive-Link" ?) is a good idea,
though. Using my MUA (Thunderbird) that would make it as simple to get as
View Source (Ctrl-U) and a simple C, which would make committers' lives
easy. I bet others might appreciate it too.

As a thunderbird user you might be interested in something like
https://github.com/mhagander/pgarchives.  I don't know if it works with
modern versions of thunderbird, but hopefully it does. And would be more
convenient than the view source way..

/Magnus


Re: [HACKERS] [GENERAL] How to change order sort of table in HashJoin

2016-11-19 Thread Melvin Davidson
On Sat, Nov 19, 2016 at 12:46 AM, Man Trieu  wrote:

> Hi Experts,
>
> As in the example below, i think the plan which hash table is created on
> testtbl2 (the fewer tuples) should be choosen.
> Because creating of hash table should faster in testtbl2. But it did not.
>
> I have tried to change the ordering of table by tuning parameter even if
> using pg_hint_plan but not success.
>
> Why does planner do not choose the plan which hash table is created on
> testtbl2 (which can take less time)?
> And how to change the order?
>
> # I also confirm planner info by rebuild postgresql but not found related
> usefull info about hash table
>
> ---
> postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text,
> primary key (c1,c2,c3));
> CREATE TABLE
> postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text,
> primary key (c1,c2,c3));
> CREATE TABLE
> postgres=# insert into testtbl1 select generate_series(1,100),
> random()::text,random()::text,random()::text;
> INSERT 0 100
> postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
> INSERT 0 142857
>
> postgres=# explain analyze select * from testtbl1 inner join testtbl2
> using(c1,c2,c3);
>QUERY PLAN
> 
> -
>  Hash Join  (cost=38775.00..47171.72 rows=1 width=59) (actual
> time=1120.824..1506.236 rows=142857 loops=1)
>Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
> AND (testtbl2.c3 = testtbl1.c3))
>->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56)
> (actual time=0.008..27.964 rows=142857 loops=1)
>->  Hash  (cost=21275.00..21275.00 rows=100 width=55) (actual
> time=1120.687..1120.687 rows=100 loops=1)
>  Buckets: 131072  Batches: 1  Memory Usage: 89713kB
>  ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=100
> width=55) (actual time=0.035..458.522 rows=100 loops=1)
>  Planning time: 0.922 ms
>  Execution time: 1521.258 ms
> (8 rows)
>
> postgres=# set pg_hint_plan.enable_hint to on;
> SET
> postgres=# /*+
> postgres*# HashJoin(testtbl1 testtbl2)
> postgres*# Leading(testtbl1 testtbl2)
> postgres*# */
> postgres-# explain analyze select * from testtbl1 inner join testtbl2
> using(c1,c2,c3);
>QUERY PLAN
> 
> -
>  Hash Join  (cost=48541.00..67352.86 rows=1 width=59) (actual
> time=1220.625..1799.709 rows=142857 loops=1)
>Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
> AND (testtbl2.c3 = testtbl1.c3))
>->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56)
> (actual time=0.011..58.649 rows=142857 loops=1)
>->  Hash  (cost=21275.00..21275.00 rows=100 width=55) (actual
> time=1219.295..1219.295 rows=100 loops=1)
>  Buckets: 8192  Batches: 32  Memory Usage: 2851kB
>  ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=100
> width=55) (actual time=0.021..397.583 rows=100 loops=1)
>  Planning time: 3.971 ms
>  Execution time: 1807.710 ms
> (8 rows)
>
> postgres=#
> ---
>
>
> Thanks and best regard!
>




*AFAIK, the only way to change a sort order is to use the ORDER BY clause
in the SELECT.https://www.postgresql.org/docs/9.4/static/sql-select.html
"8. If the
ORDER BY clause is specified, the returned rows are sorted in the specified
order. If ORDER BY is not given, the rows are returned in whatever order
the system finds fastest to produce."*

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Andrew Dunstan



On 11/19/2016 07:36 AM, Stephen Frost wrote:

Magnus,

* Magnus Hagander (mag...@hagander.net) wrote:

On Sat, Nov 19, 2016 at 1:07 AM, Stephen Frost  wrote:

* Magnus Hagander (mag...@hagander.net) wrote:

It would make the URLs actually short, but as mentioned upthread, that
wouldn't work at all if offline. So it'd be a tradeoff between those, but
so are pretty much all other options that don't include the full

message-id.

This is a bit of a crazy idea, but in the new list system, couldn't we
add a header which includes "our" surrogate message-id?  Or possibly the
entire URL to the message, and maybe the URL for the entire thread?

I'd rather not tie those systems in that tightly. I think they are much
better off being de-coupled.

I get that, but...


That said, what we could do is invent our own "id". We could either use a
separate surrogate key, or we could do the sha-1 hash of the messageid. And
stick that in a header, which could then be searched for both locally and
remotely.

Yeah, that's a good thought too.  I think we'd need to use a SHA1 to
avoid collisions which means that it'll be a bit longer than if we used
an actual ID, but it shouldn't be *too* long.




I can think of at least one scenario where you might not easily have 
access to any invented ID - you're the author and you have mailing list 
me-too turned off, so you don't have anything that the mailing list has 
inserted. I honestly think we should just stick with the Message-ID 
value and get over the ugliness.


Putting the clickable URL in a header("X-Archive-Link" ?) is a good 
idea, though. Using my MUA (Thunderbird) that would make it as simple to 
get as View Source (Ctrl-U) and a simple C, which would make 
committers' lives easy. I bet others might appreciate it too.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimizations - pushing filter before aggregation

2016-11-19 Thread Mithun Cy
On Sat, Nov 19, 2016 at 8:59 AM, Pavel Stehule 
wrote:

>
>
>
>> SELECT MIN(a) m FROM
>>(SELECT a FROM t WHERE a=2) AS v(a)
>>
>> The subquery is going to return an intermediate result of:
>>
>> V:A
>> ---
>> 2
>>
>> And the minimum of that is 2, which is the wrong answer.
>>
>
> yes, you have true,
>
> In above case wondering if we could do this

Min(a) = 2 is the condition, generate condition *"a <= 2"* and push it down
as scan key. Since pushed down condition is lossy one for us ( it gives
values < 2), finally do a recheck of *"Min(a) = 2"*.

For Max(a) = 2 we can have *"a >=2"*,

If both are given we can combine them appropriately.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-19 Thread Karl O. Pinc
On Sat, 19 Nov 2016 12:58:47 +0100
Gilles Darold  wrote:

> All patches you've submitted on tha v13 patch have been applied and
> are present in attached v14 of the patch. I have not included the
> patches about GUC changes because I'm not sure that adding a new file
> (include/utils/guc_values.h) just for that will be accepted or that it
> will not require a more global work to add other GUC values. However
> perhaps this patch can be submitted separately if the decision is not
> taken here.

Understood.  I've a couple of other patches that do
a little cleanup on master that I'd also like to submit
along with your patch.  This on the theory that
the maintainers will be looking at this code anyway
because your patch touches it.  All this can be submitted
for their review at once.  My approach is to be minimally invasive on
a per-patch basis (i.e. your patch) but add small patches
that make existing code "better" without touching
functionality.  (Deleting unnecessary statements, etc.)
The overall goal being a better code base.

I've found what I think is another bug, where a 
long-stale filename could be written to current_logfiles.
I've coded a patch.

I also have a patch to ensure that current_logfiles never
"skips" a logfile due to ENFILE or ENFILE in
logfile_writename() (only).

I'm done with all the coding and need to refactor on top
of your v14 patch and on top of each other.  I'll send all
patches back to you for your review.  (Unless you've
some objection.)  I may not get to this until Monday morning.

I haven't throughly tested the ENFILE/ENFILE patch.
I was planning on doing that while you looked over the
code.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-11-19 Thread Dilip Kumar
On Mon, Nov 14, 2016 at 9:44 PM, Dilip Kumar  wrote:
>> Also, what if we abandoned the idea of pushing qual evaluation all the
>> way down into the heap and just tried to do HeapKeyTest in SeqNext
>> itself?  Would that be almost as fast, or would it give up most of the
>> benefits?
> This we can definitely test. I will test and post the data.
>
> Thanks for the suggestion.

Here are some results with this approach...

create table test(a int, b varchar, c varchar, d int);
insert into test values(generate_series(1,1000), repeat('x', 30),
repeat('y', 30), generate_series(1,1000));
analyze test;

query: explain analyze select * from test where a < $1;

selectivity  head  patch1 patch2 patch3
10.00%840   460 582  689
20.00%885   563 665  752
50.00%  1076   786 871  910
80.00%  1247   988   10551099
100.00%1386 112311931237

patch1: Original patch (heap_scankey_pushdown_v1.patch), only
supported for fixed length datatype and use heap_getattr.

patch2: Switches memory context in HeapKeyTest + Store tuple in slot
and use slot_getattr instead of heap_getattr.

patch3: call HeapKeyTest in SeqNext after storing slot, and also
switch memory context before calling HeapKeyTest.

Summary: (performance data)

1. At 10% selectivity patch1 shows > 40% gain which reduced to 30% in
patch2 and finally it drops to 17% in patch3.
2. I think patch1 wins over patch2, because patch1 avoid call to ExecStoreTuple.
3. Patch2 wins over patch3 because patch2 can reject tuple in
heapgettup_pagemode whereas patch3 can do it in SeqNext, so patch2 can
avoid some function calls instructions.

Summary (various patches and problems)
-
Patch1:
problem1: This approach has performance problem in some cases (quals
on many columns of the table + high selectivity (>50%)).
problem2: HeapKeyTest uses ExecutorContext so we need to block any
datatype which needs memory allocation during eval functions.

Patch2: Patch2 solves both the problems of patch1, but exposes
executor items to heap and it's not a good design.

Patch3: This solves all the problems exists in patch1+patch2, but
performance is significantly less.

I haven't yet tested patch3 with TPCH, I will do that once machine is available.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Andreas Seltenreich
Hi,

sqlsmith just made a GUC that tricks the serialization code into
dereferencing a nullpointer.  Here's a recipe:

--8<---cut here---start->8---
set min_parallel_relation_size to 0;
set max_parallel_workers_per_gather to 2;
set force_parallel_mode to on;
begin;
select set_config('foo.baz', null, b) from (values (false), (true)) g(b);
commit;
select 1;
--8<---cut here---end--->8---

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Stephen Frost
Magnus,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Sat, Nov 19, 2016 at 1:07 AM, Stephen Frost  wrote:
> > * Magnus Hagander (mag...@hagander.net) wrote:
> > > It would make the URLs actually short, but as mentioned upthread, that
> > > wouldn't work at all if offline. So it'd be a tradeoff between those, but
> > > so are pretty much all other options that don't include the full
> > message-id.
> >
> > This is a bit of a crazy idea, but in the new list system, couldn't we
> > add a header which includes "our" surrogate message-id?  Or possibly the
> > entire URL to the message, and maybe the URL for the entire thread?
> 
> I'd rather not tie those systems in that tightly. I think they are much
> better off being de-coupled.

I get that, but...

> That said, what we could do is invent our own "id". We could either use a
> separate surrogate key, or we could do the sha-1 hash of the messageid. And
> stick that in a header, which could then be searched for both locally and
> remotely.

Yeah, that's a good thought too.  I think we'd need to use a SHA1 to
avoid collisions which means that it'll be a bit longer than if we used
an actual ID, but it shouldn't be *too* long.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-19 Thread David Steele
On 11/18/16 12:38 PM, David Steele wrote:

> On 11/14/16 4:29 AM, Michael Paquier wrote:
>> On Mon, Nov 14, 2016 at 6:10 PM, Kyotaro HORIGUCHI
>>> If I'm not missing something, at the worst we have a checkpoint
>>> after a checkpointer restart that should have been supressed. Is
>>> it worth picking it up for the complexity?
> 
> That's the way I read it as well.  It's not clear to me how the
> checkpointer would get restarted under normal circumstances.
> 
> I did a kill on the checkpointer and it was ignored.  After a kill -9
> the checkpointer process came back but also switched the xlog.  Is this
> the expected behavior?

Ah, never mind.  I can see this caused a restart and recovery so the
archive timeout was reset and a switch occurred after timeout.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-19 Thread Gilles Darold
Le 16/11/2016 à 17:38, Karl O. Pinc a écrit :
> On Mon, 7 Nov 2016 23:29:28 +0100
> Gilles Darold  wrote:
>
>> Here is the v13 of the patch, ...
> Attached is a doc patch to apply on top of v13.
>
> It adds a lot more detail regards just what is
> in the current_logfiles file and when it's
> in current_logfiles.  I'd like review both for
> language and accuracy, but I'd also like to
> confirm that we really want the documented
> behavior regards what's there at backend startup
> v.s. what's there during normal runtime.  Seems
> ok the way it is to me but...
>
> This patch is also starting to put a lot of information
> inside a graphical table.  I'm fine with this but
> it's a bit of a departure from the other cells of
> the table so maybe somebody else has a better
> suggestion.
>
> Regards,
>
> Karl 
> Free Software:  "You don't pay back, you pay forward."
>  -- Robert A. Heinlein
>

All patches you've submitted on tha v13 patch have been applied and are
present in attached v14 of the patch. I have not included the patches
about GUC changes because I'm not sure that adding a new file
(include/utils/guc_values.h) just for that will be accepted or that it
will not require a more global work to add other GUC values. However
perhaps this patch can be submitted separately if the decision is not
taken here.

Regards,

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index adab2f8..645cbb9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4143,6 +4143,12 @@ SELECT * FROM parent WHERE key = 2400;
  server log
 
 
+When logs are written to the file-system their paths, names, and
+types are recorded in
+the  file.  This provides
+a convenient way to find and access log content without establishing a
+database connection.
+
 
  Where To Log
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2e64cc4..f5bfe59 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15443,6 +15443,19 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
   
 
   
+   pg_current_logfile()
+   text
+   primary log file name in use by the logging collector
+  
+
+  
+   pg_current_logfile(text)
+   text
+   log file name, of log in the requested format, in use by the
+   logging collector
+  
+
+  
pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
@@ -15660,6 +15673,34 @@ SET search_path TO schema , schema, ..
 the time when the postmaster process re-read the configuration files.)

 
+
+pg_current_logile
+   
+
+   
+Logging
+pg_current_logfile function
+   
+
+   
+pg_current_logfile returns, as text,
+the path of either the csv or stderr log file currently in use by the
+logging collector.  This is a path including the
+ directory and the log file name.
+Log collection must be active or the return value
+is NULL.  When multiple logfiles exist, each in a
+different format, pg_current_logfile called
+without arguments returns the path of the file having the first format
+found in the ordered
+list: stderr, csvlog.
+NULL is returned when no log file has any of these
+formats.  To request a specific file format supply,
+as text, either csvlog
+or stderr as the value of the optional parameter. The
+return value is NULL when the log format requested
+is not a configured .
+   
+

 pg_my_temp_schema

diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 5c52824..256a14f 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -60,6 +60,48 @@ Item
  Subdirectory containing per-database subdirectories
 
 
+
+ 
+  
+   current_logfiles
+  
+  
+   Logging
+   current_logfiles file
+  
+  current_logfiles
+ 
+ 
+  A file recording the log file(s) currently written to by the syslogger
+  and the file's log formats, stderr
+  or csvlog. Each line of the file is a space separated list of
+  two elements: the log format and the full path to the log file including the
+  value of .
+
+  During server startup current_logfiles reports
+  the default server log destination.  After server startup the log format
+  must be present in  to be listed in
+  current_logfiles.
+
+  
+Non-PostgreSQL log sources, such as 3rd
+party libraries, which deliver error messages directly to stderr are
+always logged by PostgreSQL to stderr.  Stderr
+is also the destination for any incomplete log messages produced by
+PostgreSQL.  When
+stderr is not in
+,
+current_logfiles does not not contain the name of the
+file where these sorts of log messages are written.
+  
+  
+  The current_logfiles file
+  is present only when  is
+  activated and when at least one of 

Re: [HACKERS] Mail thread references in commits

2016-11-19 Thread Magnus Hagander
On Sat, Nov 19, 2016 at 1:07 AM, Stephen Frost  wrote:

> Magnus,
>
> * Magnus Hagander (mag...@hagander.net) wrote:
> > It would make the URLs actually short, but as mentioned upthread, that
> > wouldn't work at all if offline. So it'd be a tradeoff between those, but
> > so are pretty much all other options that don't include the full
> message-id.
>
> This is a bit of a crazy idea, but in the new list system, couldn't we
> add a header which includes "our" surrogate message-id?  Or possibly the
> entire URL to the message, and maybe the URL for the entire thread?
>

I'd rather not tie those systems in that tightly. I think they are much
better off being de-coupled.

That said, what we could do is invent our own "id". We could either use a
separate surrogate key, or we could do the sha-1 hash of the messageid. And
stick that in a header, which could then be searched for both locally and
remotely.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/