from:"Greg Stark"

[HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Greg Stark

On 31 October 2017 at 07:05, Chris Travers  wrote:
> Hi;
>
> After Andres's excellent talk at PGConf we tried benchmarking
> effective_io_concurrency on some of our servers and found that those which
> have a number of NVME storage volumes could not fill the I/O queue even at
> the maximum setting (1000).

And was the system still i/o bound? If the cpu was 100% busy then
perhaps Postgres just can't keep up with the I/O system. It would
depend on workload though, if you start many very large sequential
scans you may be able to push the i/o system harder.

Keep in mind effective_io_concurrency only really affects bitmap index
scans (and to a small degree index scans). It works by issuing
posix_fadvise() calls for upcoming buffers one by one. That gets
multiple spindles active but it's not really going to scale to many
thousands of prefetches (and effective_io_concurrency of 1000 actually
means 7485 prefetches). At some point those i/o are going to start
completing before Postgres even has a chance to start processing the
data.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-24 Thread Greg Stark

We already know this integer overflow checking is non-standard and
compilers keep trying to optimize them out.  Our only strategy to
defeat that depends on compiler flags like -fwrapv that vary by
compiler and may or may not be working on less well tested compiler.

So if there's a nice readable and convenient way to portably use cpu
flags That would be brilliant. And I'm not too concerned if it doesn't
run on VAX.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On markers of changed data

2017-10-11 Thread Greg Stark

On 10 October 2017 at 23:50, Stephen Frost  wrote:
> Yeah, it sounds interesting, but I was just chatting w/ David about it
> and we were thinking about how checkpoints are really rather often done,
> so you end up with quite a few of these lists being out there.
>
> Now, if the lists were always kept in a sorted fashion, then perhaps we
> would be able to essentially merge-sort them all back together and
> de-dup that way but even then, you're talking about an awful lot if
> you're looking at daily incrementals- that's 288 standard 5-minute
> checkpoints, each with some 128k pages changed, assuming max_wal_size of
> 1GB, and I think we can all agree that the default max_wal_size is for
> rather small systems.  That ends up being something around 2MB per
> checkpoint to store the pages in or half a gig per day just to keep
> track of the pages which changed in each checkpoint across that day.

I was actually imagining a bitmap, probably for each 1GB piece of each
table. That's probably how you would maintain this data in memory
anyways. After compression it should be fairly small. You'll probably
be modifying the same blocks frequently or doing bulk loads which will
touch a consecutive range of blocks.

But that's still about the same amount of data. But probably you don't
want to actually keep every checkpoint anyways. The nice thing about
the changelists is that they will tend to reach a maximum size
regardless of how long a time range they span so if you keep one
changelist for every 10 checkpoints or every 100 checkpoints you could
reduce the storage needs and only lose the time precision.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On markers of changed data

2017-10-10 Thread Greg Stark

On 8 October 2017 at 08:52, Andrey Borodin  wrote:
>
> 1. Any other marker would be better (It can be WAL scan during archiving, 
> some new LSN-based mechanics* et c.)

The general shape of what I would like to see is some log which lists
where each checkpoint starts and ends and what blocks are modified
since the previous checkpoint. Then to generate an incremental backup
from any point in time to the current you union all the block lists
between them and fetch those blocks. There are other ways of using
this aside from incremental backups on disk too -- you could imagine a
replica that has fallen behind requesting the block lists and then
fetching just those blocks instead of needing to receive and apply all
the wal. Or possibly even making a cost-based decision between the two
depending on which would be faster.

It would also be useful for going in the reverse direction: look up
all the records (or just the last record) that modified a given block.
Instead of having to scan all the wal you would only need to scan the
checkpoint eras that had touched that block.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-10-03 Thread Greg Stark

There are some unanswered questions with column grants too. Do we
allow granting access to a calculated column which accesses columns
the user doesn't have access to?

If so then this is a suitable substitute for using updateable views to
handle things like granting users access to things like password
hashes or personal data with details censored without giving them
access to the unhashed password or full personal info.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-01 Thread Greg Stark

On 1 October 2017 at 16:40, Tom Lane  wrote:
> Greg Stark  writes:
>> Indeed. It's simple enough to export stats to prometheus with queryid
>> as the key. Then even if the query ages out of the database stats you
>> have graphs and derivative metrics going further back.
>
> I'm not really ready to buy into that as a supported use-case, because
> it immediately leads to the conclusion that we need to promise stability
> of query IDs across PG versions, which seems entirely infeasible to me
> (at least with anything like the current method of deriving them).

Well these kinds of monitoring systems tend to be used by operations
people who are a lot more practical and a lot less worried about
theoretical concerns like that.

What they're going to want to do is things like "alert on any query
using more than 0.1 cpu core" or "alert on any query being the top i/o
consumer that isn't amongst the top 10 over the previous day" or
"alert on any query using more than 4x the cpu or i/o on one database
than it does on average across all our databases".  That last one is a
case where it would be nice if the queryid values were stable across
versions but it's hardly surprising that they aren't.

In context the point was merely that the default
pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit
values are enough. It wouldn't be hard for there to be 64k different
queries over time and across all the databases in a fleet and at that
point it becomes likely there'll be a 32-bit collision.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-01 Thread Greg Stark

On 30 September 2017 at 21:03, Alexander Korotkov
 wrote:

> I heard from customers that they periodically dump contents of
> pg_stat_statements and then build statistics over long period of time.  If
> even they leave default pg_stat_statements.max unchanged, probability of
> collision would be significantly higher.

Indeed. It's simple enough to export stats to prometheus with queryid
as the key. Then even if the query ages out of the database stats you
have graphs and derivative metrics going further back.

I have to admit this was my first reaction to the idea of using sha1
hashes for git commits as well. But eventually I came around. If the
chances of a hash collision are smaller than a cosmic ray flipping a
bit or a digital electronics falling into a meta-stable state then I
had to admit there's not much value in being theoretically more
correct.

In practice if the query has aged out of pg_stat_statements and you're
exporting pg_stat_statements metrics to longer-term storage there's
really nothing more "correct" than just using a long enough hash and
assuming there are no collisions anyways. If 64-bits is still not
sufficient we could just go to 160-bit sha1 or 256-bit sha256.

Actually there's a reason I'm wondering if we shouldn't use a
cryptographic hash or even an HMAC. Currently if you're non-superuser
we, quite sensibly, hide the query text. But we also hide the queryid.
The latter is really inconvenient since it really makes the stats
utterly useless. I'm not sure what the rationale was but the only
thing I can think of is a fear that it's possible to reverse engineer
the query using brute force. An HMAC, or for most purposes even a
simple cryptographic hash with a secret salt would make that
impossible.

(I have other advances in pg_stat_statements I would love to see. It
would be so much more helpful if pg_stat_statements also kept a few
examples of query parameters such as the most recent set, the set that
caused the longest execution, maybe the set with the largest of each
metric.)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The case for removing replacement selection sort

2017-09-09 Thread Greg Stark

On 8 September 2017 at 18:06, Peter Geoghegan  wrote:

> * It's still faster with int4/int8 comparisons on modern hardware, and
> I think that most ordered inputs will be of those types in practice.

This may be a bit "how long is a piece of string" but how do those two
compare with string sorting in an interesting encoding/locale -- say
/usr/share/dict/polish in pl_PL for example. It's certainly true that
people do sort text as well as numbers. Also, people often sort on
keys of more than one column

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JIT compiling expressions/deform + inlining prototype v2.0

2017-09-05 Thread Greg Stark

On 5 September 2017 at 11:58, Konstantin Knizhnik
 wrote:
>
> I wonder if we can perform some optimization in this case (assuming that in
> typical cases column either contains mostly non-null values, either mostly
> null values).

If you really wanted to go crazy here you could do lookup tables of
bits of null bitmaps. Ie, you look at the first byte of the null
bitmap, index into an array and it points to 8 offsets for the 8
fields covered by that much of the bitmap. The lookup table might be
kind of large since offsets are 16-bits so you're talking 256 * 16
bytes or 2kB for every 8 columns up until the first variable size
column (or I suppose you could even continue in the case where the
variable size column is null).

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CSV Logging questions

2017-09-04 Thread Greg Stark

I was just looking over the CSV logging code and have a few questions
about why things were done the way they were done.

1) Why do we gather a per-session log line number? Is it just to aid
people importing to avoid duplicate entries from partial files? Is
there some other purpose given that entries will already be sequential
in the csv file?

2) Why is the file error conditional on log_error_verbosity? Surely
the whole point of a structured log is that you can log everything and
choose what to display later -- i.e. why csv logging doesn't look at
log_line_prefix to determine which other bits to display. There's no
added cost to include this information unconditionally and they're far
from the largest piece of data being logged either.

3) Similarly I wonder if the statement should always be included even
with hide_stmt is set so that users can write sensible queries against
the data even if it means duplicating data.

4) Why the session start time? Is this just so that uniquely identiifes a session? Should we perhaps
generate a unique session identifier instead?

The real reason I'm looking at this is because I'm looking at the
json_log plugin from Michael Paquier. It doesn't have the log line
numbers and I can't figure whether this is something it should have
because I can't quite figure out why they exist in CSV files. I think
there are a few other fields that have been added in Postgres but are
missing from the JSON log because of version skew.

I'm wondering if we should abstract out the CSV format so instead of
using emit_log_hook you would add a new format and it would specify a
"add_log_attribute(key,val)" hook which would get called once per log
format so you could have as many log formats as you want and be sure
they would all have the same data. That would also mean that the
timestamps would be in sync and we could probably eliminate the
occurrences of the wrong format appearing in the wrong logs.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_destination=file

2017-09-01 Thread Greg Stark

On 31 August 2017 at 13:49, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane  wrote:
> Yes, it's pretty important, because of assorted stuff not-under-our-
> control that doesn't know about ereport and will just print to stderr
> anyway.  Some examples: dynamic linker can't-resolve-symbol failure
> messages, glibc malloc corruption error messages, just about any external
> module in plperl or plpython.  I don't find this to be negotiable.

So what happens now with these messages? My understanding is that
they're missing from the CSV logs and are simply inserted into the
text logs without any log_line_prefix? The logging collector doesn't
recognize these messages and reformat them for the CSV logs does it?

I'm actually asking because I'm more concerned with JSON logs or
msgpack logs. Currently these are supported with an emit_log_hook but
they can't capture these non-ereport logs either.

Also the CSV and emit_log_hook based logs don't have any convenient
way to turn them on and off and control the location and filename of
the logs. It would be nice if we could have something like

log_destinations='stderr=text,syslog=json,postgresql-%Y-%m-%d_%H%M%S.csv=csv'

>> Are you actually asking for a benchmark of if logging gets slower?
>
> Yes.

Personally I don't think it's "performance" so much as operational
issues that are more concerning. For all we know there are people out
there who tried to use the logging collector and found it didn't work
well on some system -- perhaps it interacted with systemd or something
else on the system -- and they switched back to just using stderr. I
don't know how to flush these users out though if there are any. Just
making this change early in a release cycle is the best we can do.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] signed logging format for pid in log_line_prefix?

2017-09-01 Thread Greg Stark

Both the text and csv logging seem to use %d on for logging the server pid:

appendStringInfo(buf, "%d", MyProcPid);

Am I missing something or wouldn't this mean we print pids with large
values as negative numbers? Isn't that strange? Wouldn't we rather use
%u here?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generated columns

2017-08-31 Thread Greg Stark

On 31 August 2017 at 05:16, Peter Eisentraut
 wrote:
> Here is another attempt to implement generated columns.  This is a
> well-known SQL-standard feature, also available for instance in DB2,
> MySQL, Oracle.  A quick example:
>
>   CREATE TABLE t1 (
> ...,
> height_cm numeric,
> height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
>   );

I only recently discovered we actually already have this feature. Kind of.
stark=# CREATE TABLE t1 (height_cm numeric);
CREATE TABLE
Time: 38.066 ms
stark***=# create function height_in(t t1) returns numeric language
'sql' as 'select t.height_cm * 2.54' ;
CREATE FUNCTION
Time: 1.216 ms
stark***=# insert into t1 values (2);
INSERT 0 1
Time: 10.170 ms
stark***=# select t1.height_cm, t1.height_in from t1;
┌───┬───┐
│ height_cm │ height_in │
├───┼───┤
│ 2 │  5.08 │
└───┴───┘
(1 row)

Time: 1.997 ms

Yours looks better :)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MAIN, Uncompressed?

2017-08-25 Thread Greg Stark

On 25 August 2017 at 19:59, Simon Riggs  wrote:
>
> On 25 August 2017 at 14:08, Tom Lane  wrote:
>
> > Maybe, but the use case seems mighty narrow.
>
> JSON blobs between 2kB and 8160 bytes are very common.
>
> String length is maybe a poisson distribution, definitely not uniform.

But JSON blobs should be highly compressible. Even jsonb will be quite
compressible.

That said I always found remembering the mapping from these names to
various behaviours to be quite hard to use. I would have found it far
more useful to have two separate properties I could set "compress" and
"external" or perhaps even more useful would be to set some kind of
guideline size threshold for each (and perhaps a second size compress
threshold and external threshold for the whole tuple).

I think this is a particularly old piece of code and we're lucky the
default heuristics have served well for all this time because I doubt
many people fiddle with these storage attributes. The time may have
come to come up with a better UI for the storage attributes because
people are doing new things (like json) and wanting more control over
this heuristic.

For what it's worth I think a good start would be to give people more
visibility into what the tuptoaster heuristic is actually doing to
their data and that will encourage people to give feedback about when
they're surprised and are frustrated by the existing UI.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Funny WAL corruption issue

2017-08-11 Thread Greg Stark

On 10 August 2017 at 15:26, Chris Travers  wrote:
>
>
> The bitwise comparison is interesting.  Remember the error was:
>
> pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: unexpected
> pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset
> 1146880
...
> Since this didn't throw a checksum error (we have data checksums disabled but 
> wal records ISTR have a separate CRC check), would this perhaps indicate that 
> the checksum operated over incorrect data?

No checksum error and this "unexpected pageaddr" doesn't necessarily
mean data corruption. It could mean that when the database stopped logging
it was reusing a wal file and the old wal stream had a record boundary
on the same byte position. So the previous record checksum passed and
the following record checksum passes but the record header is for a
different wal stream position.

I think you could actually hack xlogdump to ignore this condition and
keep outputting and you'll see whether the records that follow appear
to be old wal log data.  I haven't actually tried this though.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More optimization effort?

2017-07-21 Thread Greg Stark

On 21 July 2017 at 20:00, Tom Lane  wrote:

>> I have, however, decided not to volunteer to be the one who works on
>> that project.
>
> Me either.  Any one of these things would require a *lot* of work in
> order to have a coherent feature that provided useful behavior across
> a bunch of different datatypes.

I had in the past idly thought about whether it would be possible to
link in one of the various general purpose theorem proving libraries
and use it to simplify the expressions. But I really have no idea how
much work it would be to teach one about all the properties and
constraints of our existing data types and operators or for that
matter how easy it would be to figure out what theorems we want proven
to be able to use an index.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2017-07-21 Thread Greg Stark

On 20 July 2017 at 14:19, Tom Lane  wrote:
> Greg Stark  writes:
>
>> Would it be useful to keep in one of the memory checking assertion builds?
>
> Why?  Code that expects to continue accessing a relcache entry's tupdesc
> after closing the relcache entry is broken, independently of whether it's
> in a debug build or not.

Mea Culpa. I hadn't actually read the code and assumed it would be
sensible to change from something that freed these tupdescs into
something that raised an assertion if they were still unfreed at end
of transaction.

Reading the code I see that it's only there to *avoid* freeing the
tupledesc just in case there's something still using it. If we just
free it then the normal memory checking builds would catch cases where
they're used after being freed.

But what I still don't understand is whether the fact that it still
passes the regression tests means anything. Unless there happened to
be a sinval message at the wrong time the regression tests wouldn't
uncover any problem cases. If it passed the tests on a
CLOBBER_CACHE_ALWAYS build then perhaps that would prove it's safe? Or
perhaps if the columns haven't actually been altered it would still
fail to fail?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2017-07-20 Thread Greg Stark

On 19 July 2017 at 00:26, Tom Lane  wrote:

> It's probably a bit late in the v10 cycle to be taking any risks in
> this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> as soon as the v11 cycle opens, unless someone can show an example
> of non-broken coding that requires it.  (And if so, there ought to
> be a regression test incorporating that.)

Would it be useful to keep in one of the memory checking assertion builds?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something for the TODO list: deprecating abstime and friends

2017-07-16 Thread Greg Stark

On 15 July 2017 at 23:00, Tom Lane  wrote:

> While it's too late in the v10 cycle to do anything very meaningful
> about this now, I am tempted to strengthen the deprecation notice's
> wording from "might disappear" to "will disappear".

"Will certainly disappear at some unspecified date" is a lot less
convincing than "will disappear in 2021 in Postgres 14". The specific
year and version number is irrelevant
but picking and naming a specific one makes it a lot easier to follow
through come that date.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Greg Stark

On 12 July 2017 at 16:11, Tom Lane  wrote:
> Jeroen Ooms  writes:
>
>> This works but it's a bit of a pain to maintain. I was wondering if
>> this hack could be merged so that the standard 'configure
>> --enable-static' script would install a static library for libpq
>> alongside the shared one.
>
> FWIW, we used to have support for building static libpq, but
> we got rid of it a long time ago.  I couldn't find the exact
> spot in some desultory trawling of the commit history.

Fwiw I think the real problem is that building static libraries
"properly" requires different compiler options -- notably they're not
normally built with -fPIC. So that means building every object twice
which kind of breaks make's build model which has a simple dependency
graph where each object appears once. Some packages do this by
inventing a foo-shared.o and foo-static.o but that introduces its own
weirdness.

I don't know what the downsides would be of creating a static library
out of objects built with -fPIC. It might just be a small performance
penalty which might be no big deal for libpq. That may be a good
compromise.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New partitioning - some feedback

2017-07-10 Thread Greg Stark

On 10 July 2017 at 23:46, David Fetter  wrote:
> On Mon, Jul 10, 2017 at 05:33:34PM -0500, Robert Haas wrote:
>> On Mon, Jul 10, 2017 at 2:15 AM, Amit Langote
>>  wrote:
>> > I posted a patch upthread which makes \d hide partitions
>> > (relispartition = true relations) and include them if the newly
>> > proposed '!' modifier is specified.  The '+' modifier is being
>> > used to show additional detail of relations chosen to be listed at
>> > all, so it seemed like a bad idea to extend its meaning to also
>> > dictate whether partitions are to be listed.
>>
>> +1.  That'd be a mess.
>
> With utmost respect, it's less messy than adding '!' to the already
> way too random and mysterious syntax of psql's \ commands.  What
> should '\det!' mean?  What about '\dT!'?

Fwiw as, I believe, the first person to make this complaint I would be
fine with + listing all partitions. Imho adding an orthogonal "!"
would be too much mental overhead for the user.

If you want something different perhaps we can invent ++ for "even
more information" and list partitions only if two plusses are
provided. (I don't think the other way around makes sense since you
might need a way to list permissions and comments without listing
every partition if you're on a system with an unmanageable number of
partitions but you never absolutely need a way to list partitions
without the comments and permissions). At least that doesn't require
the user to learn a new flag and how it interacts with everything
else.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Key normalization for nbtree

2017-07-10 Thread Greg Stark

On 10 July 2017 at 19:40, Peter Geoghegan  wrote:
> Key normalization means creating a representation for internal page
> items that we always just memcmp(), regardless of the details of the
> underlying datatypes.

One thing I would like to see is features like this added to the
opclasses (or opfamilies?) using standard PG functions that return
standard PG data types. So if each opclass had a function that took
the data type in question and returned a bytea then you could
implement that function using a language you felt like (in theory),
test it using standard SQL, and possibly even find other uses for it.
That kind of abstraction would be more promising for the future than
having yet another C api that is used for precisely one purpose and
whose definition is "provide the data needed for this usage".

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-07-06 Thread Greg Stark

On 6 July 2017 at 15:29, Jim Finnerty  wrote:
>
> Feel free to knock down this 'straw man' and propose something better!

I think the pattern in this design that we don't want is that it
imposes extra complexity on every user of every page even when the
page doesn't have the problem and even when the problem isn't anywhere
in the database. Even years from now when this problem is long gone
you'll have code paths for dealing with this special page format that
are rarely executed and never tested that will have to be maintained
blind.

Ideally a solution to this problem that imposes a cost only on the
weird pages and only temporarily and leave the database in a
"consistent" state that doesn't require any special processing when
reading the data would be better.

The "natural" solution is what was discussed for incompatible page
format changes in the past where there's an point release of one
Postgres version that tries to ensure there's enough space on the page
for the next version and keeps track of whether there are any
problematic pages. Then you would be blocked from upgrading until you
had ensured all pages had space (presumably by running some special
"vacuum upgrade" or something like that).

Incidentally it's somewhat intriguing to think about what would happen
if we *always* did such a tombstone for deletes. Or perhaps only when
it's a full_page_write. Since the whole page is going into the log and
that tuple will never be modified again you could imagine just
replacing the tuple with the LSN of the deletion and letting anyone
who really needs it fetch it from the xlog. That would be a completely
different model from the way Postgres works though. More like a
log-structured storage system.
-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during "make check"

2017-07-03 Thread Greg Stark

On 2 July 2017 at 18:33, Tom Lane  wrote:
> system("cp -a ...") call in favor of something more portable.

If we're ok with using Perl there's File::Copy::Recursive::dircopy()
which does exactly that.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-09 Thread Greg Stark

On 7 June 2017 at 01:01, Josh Berkus  wrote:
> P3: apparently jsonb_to_tsvector with lang parameter isn't immutable?
> This means that it can't be used for indexing:
>
> libdata=# create index bookdata_fts on bookdata using gin ((
> to_tsvector('english',bookdata)));
> ERROR:  functions in index expression must be marked IMMUTABLE

I don't have a machine handy to check on but isn't this a strange
thing to do? Isn't there a GIN opclass on jsonb itself which would be
the default if you didn't have that to_tsvector() call  -- and which
would also work properly with the jsonb operators?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Functions

2017-05-14 Thread Greg Stark

On 13 May 2017 at 10:29, Robert Haas  wrote:
> - Floats.  There may be different representations in use on different
> hardware, which could be a problem.  Tom didn't answer my question
> about whether any even-vaguely-modern hardware is still using non-IEEE
> floats, which I suspect means that the answer is "no".

Fwiw the answer to that is certainly no. The only caveat is that some
platforms have not entirely complete implementations of IEEE missing
corner cases such as denormalized values but I don't think that would
be something that would be changed with a different hash function
though.

Personally while I would like to avoid code that actively crashes or
fails basic tests on Vax even I don't think we need to worry about
replication or federated queries in a heterogeneous environment where
some servers are Vaxen and some are modern hardware. That seems a bit
far-fetched.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

2017-05-01 Thread Greg Stark

On 1 May 2017 at 20:46, Robert Haas  wrote:
>  One problem is that Bloom filters assume you can get
> n independent hash functions for a given value, which we have not got.
> That problem would need to be solved somehow.  If you only have one
> hash function, the size of the required bloom filter probably gets
> very large.

There's a simple formula to calculate the optimal number of hash
functions and size of the filter given a target false positive rate.

But I don't think this is as big of a problem as you imagine.

a) we don't really only have one hash function, we have a 32-bit hash
function and we could expand that to a larger bit size if we wanted.
Bloom filters are never 2^32 size bit arrays for obvious reasons. If
you have a 1kbit sized bloom filter that only needs 10 bits per index
so you have three fully independent hash functions available already.
If we changed to a 64-bit or 128-bit hash function then you could have
enough bits available to have a larger set of hash functions and a
larger array.

b) you can get a poor man's universal hash out of hash_any or hash_int
by just tweaking the input value in a way that doesn't interact in a
simple way with the hash function. Even something as simple has xoring
it with a random number (i.e. a vector of random numbers that identify
your randomly chosen distinct "hash functions") seems to work fine.

However for future-proofing security hardening I think Postgres should
really implement a real mathematically rigorous Universal Hashing
scheme which provides a family of hash functions from which to pick
randomly. This prevents users from being able to generate data that
would intentionally perform poorly in hash data structures for
example. But it also means you have a whole family of hash functions
to pick for bloom filters.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

2017-05-01 Thread Greg Stark

On 1 May 2017 at 19:24, Andres Freund  wrote:
>> There is no inherent reason why the CREATE INDEX CONCURRENTLY style of
>> using multiple transactions makes it necessary to leave a mess behind
>> in the event of an error or hard crash. Is someone going to get around
>> to fixing the problem for CREATE INDEX CONCURRENTLY (e.g., having
>> extra steps to drop the useless index during recovery)? IIRC, this was
>> always the plan.
>
> Doing catalog changes in recovery is frought with problems. Essentially
> requires starting one worker per database, before allowing access.

The "plan" was to add more layers PG_TRY and transactions so that if
there was an error during building the index all the remnants of the
failed index build got cleaned up. But when I went tried to actually
do it the problem seemed to metastatize and it was going to require
two or three layers of messy nested PG_TRY and extra transactions.
Perhaps there's a cleaner way to structure it and I should look again.

I don't recall ever having a plan to do anything in recovery. I think
we did talk about why it was hard to mark hash indexes invalid during
recovery which was probably the same problem.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multithreading in Batch/pipelining mode for libpq

2017-04-22 Thread Greg Stark

On 21 April 2017 at 21:31, Ilya Roublev  wrote:
> What I need is to make a huge amount of inserts

This may be a silly question but I assume you've already considered
using server-side COPY? That's the most efficient way to load a lot of
data currently.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Highly Variable Planning Times

2017-04-20 Thread Greg Stark

On 19 April 2017 at 22:39, Michael Malis  wrote:

>> *At best*, you're doing substantial work in the
>> planner to avoid the first tree descent step or two in a single
>> non-partial index.

Fwiw, in addition to replacing the first few levels of the descent
with planning-time work, there's also an advantage due to the smaller
keys. Effectively these partial indexes are emulating
prefix-compression in the btree.

> While the specific example I gave in the post could be replaced with a
> non-partial index, most of the partial indexes contain predicates that
> are not straightforward to index with non-partial indexes. About 85%
> of the partial indexes contain a regular expression in them for CSS
> selector matching. I've tried using a trigram GIN index, but it wound
> up not working too well due to the data being highly redundant (almost
> every CSS hierarchy contains 'div' in it). Additionally, most of the
> predicates for partial indexes are extremely specific making the
> partial indexes very small. The sum total size of all of the partial
> indexes is still much smaller than if we were to index every necessary
> field with regular indexes.

I wonder if you could implement a FTS parser that tokenized html in
just tokens representing the matching criteria. A GIN index using such
a parser would actually be very similar to what you have as GIN
indexes are basically a collection of btrees...

The operational problem with that is I think it would be even harder
to update a parser than adding a new partial index. I don't think you
can effectively upgrade a parser to include new tokens without
rebuilding any indexes using it. If you wanted to add new selector
critieria live you would probably end up deploying the new parser and
building a new index with CREATE INDEX CONCURRENTLY using the new
parser and then dropping the old index.

I'm not sure if it's possible to do a FTS parser for handling
arbitrary CSS selectors but if you managed that that would be a very
valuable addition to Postgres, IMHO

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-15 Thread Greg Stark

On 14 April 2017 at 20:20, Peter Eisentraut
 wrote:
>
> Yeah, I think if you're concerned about MITM then you would also be
> concerned about MITM siphoning off your data.  So you should be using
> TLS and then you don't need channel binding.

No. You can use TLS for authentication (by verifying SSL certs in both
directions) in which case TLS will protect against MITM for you. But
if you only use TLS for encryption but still want to use passwords for
authentication then there's no protection against MITM as you don't
know that the party doing the encryption is the same as the one you
authenticated to.

Channel binding is all about tying the authentication mechanism to the
encryption to guarantee that the party doing the encryption is the
same as the party you authenticated to. Otherwise someone could MITM
the TLS connection and relay the raw bytes of of the scram
negotiation. Under our md5 auth that gives them your password, under
scram they won't get the password which is a huge improvement but they
would still have the raw unencrypted data from your traffic.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Variable substitution in psql backtick expansion

2017-04-10 Thread Greg Stark

On 2 April 2017 at 07:53, Fabien COELHO  wrote:
> Note that this is already available indirectly, as show in the
> documentation.
>
>   SELECT some-boolean-expression AS okay \gset
>   \if :okay
> \echo boolean expression was true
>   \else
> \echo boolean expression was false
>   \endif

Am I the only one who thinks that even if \if got the ability to
evaluate arbitrary SQL queries I would probably still always write
things as above? I think putting arbitrary SQL expressions (let alone
queries) would make scripts just a total mess and impossible for
humans to parse.

Whereas storing the results in psql variables and then using those
variables in \if makes even fairly complex queries and nested \if
structures straightforward.  It would also make it far clearer in what
order the queries will be evaluated and under which set of conditions.

I don't think taking a simple command line execution environment like
psql and trying to embed a complete complex language parser in it is
going to result in a sensible programming environment. Having a simple
\if  is already pushing it. If someone wanted
anything more complex I would strongly recommend switching to perl or
python before trying to code up nesting arbitrary sql in nested
expressions.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-10 Thread Greg Stark

On 4 April 2017 at 17:10, Maksim Milyutin  wrote:
>
> 3. As I noticed early pg_depend table is used for cascade deleting indexes
> on partitioned table and its children. I also use pg_depend to determine
> relationship between parent and child indexes when reindex executes
> recursively on child indexes.
>
> Perhaps, it's not good way to use pg_depend to determine the relationship
> between parent and child indexes because the kind of this relationship is
> not defined. I could propose to add into pg_index table specific field of
> 'oidvector' type that specify oids of dependent indexes for the current
> local index.

Alternately you could have an single oid in pg_index on each of the
children that specifies which local index is its parent. That would
probably require a new index on that column so you could look up all
the children efficiently.

I think it would behave more sensibly when you're adding or removing a
partition, especially if you want to add many partitions in parallel
using multiple transactions. An oidvector of children would
effectively mean you could only be doing one partition creation or
deletion at a time.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-22 Thread Greg Stark

On 21 March 2017 at 20:04, Bruce Momjian  wrote:
> Yes, but once it is written it will take years before those bits can be
> used on most installations.

Well the problem isn't most installations. On most installations it
should be pretty straightforward to check the oldest database xid and
compare that to when the database was migrated to post-9.0. (Actually
there may be some additional code to write but it's just ensuring that
the bits are actually cleared and not just ignored but even so
databases do generally need to be vacuumed more often than on the
order of years though.)

The problem is that somebody tomorrow could upgrade an 8.4 database to
10.0. In general it seems even versions we don't support get extra
support for migrating away from. I assume it's better to help support
upgrading than to continue to have users using unsupported versions...
And even if you're not concerned about 8.4 someone could still upgrade
9.4 for years to come.

It probably does make sense pick a version, say, 10.0, and have it go
out of its way to ensure it cleans up the MOVED_IN/MOVED_OFF so that
we can be sure that any database was pg_upgraded from 10.0+ doesn't
have any left. Then at least we'll know when the bits are available
again.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-19 Thread Greg Stark

On 18 March 2017 at 22:22, Peter Geoghegan  wrote:
>
> Out of idle curiosity, I decided to generate disassembly of both
> macaddr_cmp_internal(), and the patch's abbreviated comparator. The
> former consists of 49 x86-64 instructions at -02 on my machine,
> totaling 135 bytes of object code. The latter consists of only 10
> instructions, or 24 bytes of object code.

I wonder if there's something that could be optimized out of the
normal cmp function but we're defeating some compiler optimizations
with all our casts and aliasing.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-03-03 Thread Greg Stark

On 2 March 2017 at 13:03, amul sul  wrote:
> create table foo (a integer, b text) partition by hash (a);
> create table foo1 partition of foo with (modulus 4, remainder 0);
> create table foo2 partition of foo with (modulus 8, remainder 1);  -- legal,
> modulus doesn't need to match
> create table foo3 partition of foo with (modulus 8, remainder 4);  --
> illegal, overlaps foo1

Instead of using modulus, why not just divide up the range of hash
keys using ranges? That should be just as good for a good hash
function (effectively using the high bits instead of the low bits of
the hash value). And it would mean you could reuse the machinery for
list partitioning for partition exclusion.

It also has the advantage that it's easier to see how to add more
partitions. You just split all the ranges and (and migrate the
data...). There's even the possibility of having uneven partitions if
you have a data distribution skew -- which can happen even if you have
a good hash function. In a degenerate case you could have a partition
for a single hash of a particularly common value then a reasonable
number of partitions for the remaining hash ranges.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-02-25 Thread Greg Stark

On 24 February 2017 at 14:57, David G. Johnston
 wrote:
> I dislike an error.  I'd say that making partition "just work" here is
> material for another patch.  In this one an update of the partition key can
> be documented as shorthand for delete-returning-insert with all the
> limitations that go with that.  If someone acceptably solves the ctid
> following logic later it can be committed - I'm assuming there would be no
> complaints to making things just work in a case where they only sorta
> worked.

Personally I don't think there's any hope that there will ever be
cross-table ctids links. Maybe one day there will be a major new table
storage format with very different capabilities than today but in the
current architecture it seems like an impossible leap.

I would expect everyone to come to terms with the basic idea that
partition key updates are always going to be a corner case. The user
defined the partition key and the docs should carefully explain to
them the impact of that definition. As long as that explanation gives
them something they can work with and manage the consequences of
that's going to be fine.

What I'm concerned about is that silently giving "wrong" answers in
regular queries -- not even ones doing the partition key updates -- is
something the user can't really manage. They have no way to rewrite
the query to avoid the problem if some other user or part of their
system is updating partition keys. They have no way to know the
problem is even occurring.

Just to spell it out -- it's not just "no-op updates" where the user
sees 0 records updated. If I update all records where
username='stark', perhaps to set the "user banned" flag and get back
"9 records updated" and later find out that I missed a record because
someone changed the department_id while my query was running -- how
would I even know? How could I possibly rewrite my query to avoid
that?

The reason I suggested throwing a serialization failure was because I
thought that would be the easiest short-cut to the problem. I had
imagined having a bit pattern that indicated such a move would
actually be a pretty minor change actually. I would actually consider
using a normal update bitmask with InvalidBlockId in the ctid to
indicate the tuple was updated and the target of the chain isn't
available. That may be something we'll need in the future for other
cases too.

Throwing an error means the user has to retry their query but that's
at least something they can do. Even if they don't do it automatically
the ultimate user will probably just retry whatever operation errored
out anyways. But at least their database isn't logically corrupted.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-23 Thread Greg Stark

On 23 February 2017 at 13:27, Greg Stark  wrote:
> => SELECT "?column"? FROM (select 1+1 as "?column?", 1+1) AS x;

Oops, I missed the typo there:

=> SELECT "?column?" FROM (select 1+1 AS "?column?", 1+1) AS x;
ERROR:  42702: column reference "?column?" is ambiguous
LINE 2: SELECT "?column?" FROM (select 1+1 AS "?column?", 1+1) AS x;
   ^
LOCATION:  scanRTEForColumn, parse_relation.c:669


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-23 Thread Greg Stark

On 22 February 2017 at 15:08, Tom Lane  wrote:
> Indeed.  When I wrote the comment you're referring to, quite a few years
> ago now, I thought that popular demand might force us to allow omitted
> aliases.  But the demand never materialized.  At this point it seems
> clear to me that there isn't really good reason to exceed the spec here.
> It just encourages people to write unportable SQL code.

Oh my. This bothers me all the time. I always assumed the reason it
was like this was because the grammar would be ambiguous without it
and it would require extreme measures to hack the grammar to work. If
it's this easy I would totally be for it.

Offhand I think there are plenty of solutions for the problem of
inventing names and I suspect any of them would work fine:

1) Don't assign a name -- I would guess this would require some
adjustments in the rule deparsing (i.e. views).

2) Assign a name but add a flag indicating the name is autogenerated
and shouldn't be used for resolving references and shouldn't be
dumped. Then it shouldn't really matter if there's a conflict since
the name is only used for things like error messages, not resolving
references.

3) thumb through all the names in the query and pick one that doesn't conflict.

For what it's worth while it wouldn't be a *bad* thing to avoid
conflicts I think this is being held to an inconsistent standard here.
It's not like there aren't similar situations elsewhere in the
codebase where we just don't worry about this kind of thing:

=> SELECT "?column"? FROM (select 1+1 as "?column?", 1+1) AS x;
ERROR:  42703: column "?column" does not exist
LINE 2: SELECT "?column"? FROM (select 1+1 as "?column?", 1+1) AS x;
   ^
HINT:  Perhaps you meant to reference the column "x.?column?" or the
column "x.?column?".

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sum aggregate calculation for single precsion real

2017-02-17 Thread Greg Stark

On 15 February 2017 at 12:52, Robert Haas  wrote:
> Personally, I find it somewhere in the middle: I think the way it
> works now is reasonable, and I think what he wants would have been
> reasonable as well.  However, I find it hard to believe it would be
> worth changing now on backward compatibility grounds.

Honestly I was quite surprised to discover that sum(float4) worked
this way because I assumed it followed the same pattern as integers.

But I wonder what you mean by backward compatibility grounds. If
sum(float4) returned a float8 the only compatibility issue would be
someone who did "create table as" and then expected to get a float4
column and instead got a float8 column. That seems like a much more
minor corner case than most any other backward incompatible change in
any release.

Moreover, it wouldn't be hard to make sum(float4) use a float8 as an
accumulator and then cast to float4 for the final state. That would be
100% compatible with the existing behaviour aside from producing more
accurate results.

(Though as an aside, I think Konstantin would be much better served by
using integers and storing cents or whatever unit of currency is small
enough. That would actually result in accurate results which neither
float4 nor float8 guarantee.)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-02-16 Thread Greg Stark

On 13 February 2017 at 12:01, Amit Khandekar  wrote:
> There are a few things that can be discussed about :

If you do a normal update the new tuple is linked to the old one using
the ctid forming a chain of tuple versions. This tuple movement breaks
that chain.  So the question I had reading this proposal is what
behaviour depends on ctid and how is it affected by the ctid chain
being broken.

I think the concurrent update case is just a symptom of this. If you
try to update a row that's locked for a concurrent update you normally
wait until the concurrent update finishes, then follow the ctid chain
and recheck the where clause on the target of the link and if it still
matches you perform the update there.

At least you do that if you have isolation_level set to
repeatable_read. If you have isolation level set to serializable then
you just fail with a serialization failure. I think that's what you
should do if you come across a row that's been updated with a broken
ctid chain even in repeatable read mode. Just fail with a
serialization failure and document that in partitioned tables if you
perform updates that move tuples between partitions then you need to
be ensure your updates are prepared for serialization failures.

I think this would require another bit in the tuple info mask
indicating that this is tuple is the last version before a broken ctid
chain -- i.e. that it was updated by moving it to another partition.
Maybe there's some combination of bits you could use though since this
is only needed in a particular situation.

Offhand I don't know what other behaviours are dependent on the ctid
chain. I think you need to go search the docs -- and probably the code
just to be sure -- for any references to ctid to ensure you catch
every impact of breaking the ctid chain.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-11 Thread Greg Stark

On 11 February 2017 at 23:45, Corey Huinker  wrote:
> So you'd just want to know nesting depth, with no indicator of true/false?

Even that's more than bash does, for example:

$ if true ; then
> if false ; then
> :
> fi
> fi

I'm a bit confused how the true/false is actually valuable. It doesn't
tell you how the expression actually evaluated, just where you are in
the code you're typing in which you can tell equally well by looking
at what code you're typing in. The reason nesting level is handy is
just to remind you in case you forget.

For debugging scripts it would be handy to have some way to tell
whether the \if expression actually evaluated to true or false but
that wouldn't be in the prompt I don't think.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-02-11 Thread Greg Stark

On 10 February 2017 at 21:36, Fabien COELHO  wrote:
>> command   prompt is now
>> ---   ---
>> \echo bob ''   = initial state, no branch going on at all
>> \if yes   't' = inside a true branch
>> \if no'tf' = false inside a true
>> \endif't' = back to just the true branch
>> \if yes   'tt'
>> \if yes   'ttt'
>> \if yes   '...ttt' = only show the last 3, but let it be known that
>> there's at least one more'
>> \else '...ttz' = past the point of a true bit of this branch
>
>
> I like the "tfz" idea. I'm not sure whether the up to 6 characters is a
> good, though.


I haven't been following this thread but just skimming through it for
the first time I thought this was more baroque than I was expecting. I
was expecting something like a { for each level of nested if you're in
so you can see how many deep you are. I didn't expect to see anything
more complex than that.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017

2017-01-27 Thread Greg Stark

On 27 January 2017 at 14:52, Thomas Kellerer  wrote:
>
> I don't have the exact syntax at hand, but it's something like this:
>
> create distinct type customer_id_type as integer;
> create distinct type order_id_type as integer;
>
> create table customers (id customer_id_type primary key);
> create table orders (id order_id_type primary key, customer_id
> customer_id_type not null);

That seems like a useful thing but it's not exactly the same use case.

Measurements with units and currency amounts both have the property
that you are likely to want to have a single column that uses
different units for different rows. You can aggregate across them
without converting as long as you have an appropriate where clause or
group by clause -- GROUP BY units_of(debit_amount) for example.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017

2017-01-27 Thread Greg Stark

On 24 January 2017 at 03:42, Peter van Hardenberg  wrote:
> The basic concept is that the value of a currency type is that it would
> allow you to operate in multiple currencies without accidentally adding
> them. You'd flatten them to a single type if when and how you wanted for any
> given operation but could work without fear of losing information.

I don't think this even needs to be tied to currencies. I've often
thought this would be generally useful for any value with units. This
would prevent you from accidentally adding miles to kilometers or
hours to parsecs which is just as valid as preventing you from adding
CAD to USD.

Then you could imagine having a few entirely optional helper functions
that could automatically provide conversion factors using units.dat or
currency exchange rates. But even if you don't use these helper
functions they would still be useful.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-01-27 Thread Greg Stark

On 25 January 2017 at 20:06, Jim Nasby  wrote:
> GUCs support SET LOCAL, but that's not the same as local scoping because the
> setting stays in effect unless the substrans aborts. What I'd like is the
> ability to set a GUC in a plpgsql block *and have the setting revert on
> block exit*.

I'm wondering which GUCs you have in mind to use this with.

Because what you're describing is dynamic scoping and I'm wondering if
what you're really looking for is lexical scoping. That would be more
in line with how PL/PgSQL variables are scoped and with how #pragmas
usually work. But it would probably not be easy to reconcile with how
GUCs work.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Greg Stark

On 26 January 2017 at 01:58, Thomas Munro  wrote:
>
> I don't know how comparable it is to our checksum technology, but
> MySQL seems to have some kind of checksums on table data, and you can
> find public emails, blogs etc lamenting corrupted databases by
> searching Google for the string "InnoDB: uncompressed page, stored
> checksum in field1" (that's the start of a longer error message that
> includes actual and expected checksums).

I'm not sure what exactly that teaches us however. I see these were
often associated with software bugs (Apparently MySQL long assumed
that a checksum of 0 never happened for example).

In every non software case I stumbled across seemed to be following a
power failure. Apparently MySQL uses a "doublewrite buffer" to protect
against torn pages but when I search for that I get tons of people
inquiring how to turn it off... So even without software bugs in the
checksum code I don't know that the frequency of the error necessarily
teaches us anything about the frequency of hardware corruption either.

And more to the point it seems what people are asking for in all those
lamentations is how they can convince MySQL to continue and ignore the
corruption. A typical response was "We slightly modified innochecksum
and added option -f that means if the checksum of a page is wrong,
rewrite it in the InnoDB page header." Which begs the question...

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Greg Stark

I tend to agree. But in the past when this came up people pointed out
you could equally do things this way and still grant all the access
you wanted using SECURITY DEFINER. Arguably that's a better approach
because then instead of auditing the entire monitor script you only
need to audit this one wrapper function, pg_ls_monitor_dir() which
just calls pg_ls_dir() on this one directory.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online enabling of page level checksums

2017-01-23 Thread Greg Stark

On Jan 22, 2017 11:13 AM, "Magnus Hagander"  wrote:


Yes, this means the entire db will end up in the transaction log since
everything is rewritten. That's not great, but for a lot of people that
will be a trade they're willing to make since it's a one-time thing. Yes,
this background process might take days or weeks - that's OK as long as it
happens online.


I'm not sure that's actually necessary. You could just log a wal record
saying "checksum this block" and if it gets replayed then recalculate the
checksum on that block again. This record could be exempt from the usual
rules for having a fpw.

There's no danger of torn pages from the checksum alone. The danger would
be if some other operation does dirty that page then your need to know that
the page is in this weird in between state where it's dirty but not yet had
a fpw written.

I'm not sure whether it's worth the infrastructure to have such a state
just for this or not. On the other hand it sounds like something that would
be useful.

Re: [HACKERS] RustgreSQL

2017-01-08 Thread Greg Stark

On 8 January 2017 at 21:50, Jim Nasby  wrote:
> Somewhat related to that... it would be useful if Postgres had "fenced"
> functions; functions that ran in a separate process and only talked to a
> backend via a well defined API (such as libpq). There's two major advantages
> that would give us:

The problem with this is that any of the "interesting" extensions need
to use the server API. That is, they need to be able to do things like
throw errors, expand toast data, etc.

IMHO just about anything you could do in an external process would be
something you could much more easily and conveniently do in the
client. And it would be more flexible and scalable as well as it's a
lot easier to add more clients than it is to scale up the database.

That said, there were several pl language implementations that worked
this way. IIRC one of the Java pl languages ran in a separate Java
process.

I think the solution to the problem you're describing is the project
formerly known as NaCl
https://en.wikipedia.org/wiki/Google_Native_Client

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-01-08 Thread Greg Stark

On 8 January 2017 at 17:26, Greg Stark  wrote:
> On 5 January 2017 at 19:01, Andres Freund  wrote:
>> That's a bit odd - shouldn't the OS network stack take care of this in
>> both cases?  I mean either is too big for TCP packets (including jumbo
>> frames).  What type of OS and network is involved here?
>
> 2x may be plausible. The first 128k goes out, then the rest queues up
> until the first ack comes back. Then the next 128kB goes out again
> without waiting... I think this is what Nagle is supposed to actually
> address but either it may be off by default these days or our usage
> pattern may be defeating it in some way.

Hm. That wasn't very clear.  And the more I think about it, it's not right.

The first block of data -- one byte in the worst case, 128kB in our
case -- gets put in the output buffers and since there's nothing
stopping it it immediately gets sent out. Then all the subsequent data
gets put in output buffers but buffers up due to Nagle. Until there's
a full packet of data buffered, the ack arrives, or the timeout
expires, at which point the buffered data drains efficiently in full
packets. Eventually it all drains away and the next 128kB arrives and
is sent out immediately.

So most packets are full size with the occasional 128kB packet thrown
in whenever the buffer empties. And I think even when the 128kB packet
is pending Nagle only stops small packets, not full packets, and the
window should allow more than one packet of data to be pending.

So, uh, forget what I said. Nagle should be our friend here.

I think you should get network dumps and use xplot to understand
what's really happening. c.f.
https://fasterdata.es.net/assets/Uploads/20131016-TCPDumpTracePlot.pdf

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-01-08 Thread Greg Stark

On 5 January 2017 at 19:01, Andres Freund  wrote:
> That's a bit odd - shouldn't the OS network stack take care of this in
> both cases?  I mean either is too big for TCP packets (including jumbo
> frames).  What type of OS and network is involved here?

2x may be plausible. The first 128k goes out, then the rest queues up
until the first ack comes back. Then the next 128kB goes out again
without waiting... I think this is what Nagle is supposed to actually
address but either it may be off by default these days or our usage
pattern may be defeating it in some way.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

2017-01-07 Thread Greg Stark

On 6 January 2017 at 02:59, Bruce Momjian  wrote:
>
> Agreed.  No need in adding overhead for short-lived locks because the
> milli-second values are going to be meaningless to users. I would be
> happy if we could find some weasel value for non-heavyweight locks.

For what it's worth I don't think this is true. It may be true that we
don't need precise measurements of short-lived locks but we do need
accurate measurements even if they're in the expected range.

What users need to know is in aggregate how much of the time the
database is spending working on their queries is going into different
states. Even if no LWLock is held for more than a few milliseconds at
a time if it turns out that 80% of the time is being spend in waiting
on LWLocks then there's no point in worrying about cpu speed, for
example. And knowing *which* LWLock is taking up an aggregate of 80%
of time would point at either configuration changes or code changes to
reduce that contention.

I would actually argue the reverse of the above proposal would be more
useful. What we need are counts of how often LWLocks take longer than,
say, 50ms and for shorter waits we need to know how long. Perhaps not
precisely for individual waits but in aggregate we need the totals to
be right so as long as the measurements are accurate that would be
sufficient. So an accurate but imprecise measurement +/- 10ms with low
overhead is better than a precise measurement with high overhead.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-02 Thread Greg Stark

On 25 December 2016 at 09:40, Lewis, Ian (Microstar Laboratories)
 wrote:
> So, the current behavior already breaks many tools unless one accepts
> that all symbols on the server are lower case. At root, based on reading
> the threads you provided, this probably indicates defects in the tools,
> rather than a problem with PostgreSQL. My reading of the standard text
> quoted in various places is that any mixed case identifier returned from
> the catalog has to be quoted to match in a query (whether you fold to
> lower or upper case).

Well tools that work with user-defined columns and make assumptions
that they don't require quoting are just buggy.

But the problem with configurable quoting rules is a bit different.
Imagine your application later decides to depend on PostGIS. So you
load the PostGIS extension and perhaps also some useful functions you
found on Stack Overflow for solving some GIS problem you have. Those
extensions will create objects and then work with those objects and
may use CamelCase for clarity -- in fact I think PostGIS functions are
documented as CamelCase.  The PostGIS extensions might not work on
your system with different case rules if they haven't been 100%
consistent with their camelCasing, and the functions from
StackOverflow would be even less likely to work.

If there was some way to scope this setting lexically so it only
affected code that's defined in specific place that might be safer.
But I don't think things are currently organized that way. If you're
only concerned with server-side functions it wouldn't be hard to have
a specific pl language that was case sensitive though it might be
tricky to do to pl/pgsql due to the way pl/pgsql depends on the sql
parser.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

2016-12-27 Thread Greg Stark

On Dec 24, 2016 5:44 PM, "Tom Lane"  wrote:

I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.


For spinlocks we could conceivably just bite the bullet and use a raw rdtsc
or the equivalent for other platforms. It might be pretty easy to
distinguish sane numbers from numbers that result after a process
reschedule and we could just discard data when that happens (or count
occurrences).

That may possibly work for spinlocks but it won't work for anything heavier
where process reschedules are routine.

Re: [HACKERS] gettimeofday is at the end of its usefulness?

2016-12-26 Thread Greg Stark

On Dec 26, 2016 10:35 PM, "Tom Lane"  wrote:


So it seems like the configure support we'd need is to detect
whether clock_gettime is available (note on Linux there's also
a library requirement, -lrt), and we would also need a way to
provide a platform-specific choice of clockid; we at least need
enough smarts to use CLOCK_MONOTONIC_RAW on macOS.


This seems like something that really should be checked at runtime. It's
very specific to the specific kernel you're running on, not the build
environment, and it can hopefully be measured in only a second or even a
fraction of a second. The only Pebblebrook would be if other things running
on the system made the test results unpredictable so that you had a small
chance of getting a very suboptimal choice and we ruling the dice each time
you restarted...

Re: [HACKERS] Declarative partitioning - another take

2016-12-16 Thread Greg Stark

Just poking around with partitioning. I notice that "\d parent"
doesn't list all the partitions, suggesting to use \d+ but a plain
"\d" does indeed list the partitions. That seems a bit strange and
also probably impractical if you have hundreds or thousands of
partitions. Has this come up in previous discussions? Unfortunately
it's proving a bit hard to search for "\d" :/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-12 Thread Greg Stark

On 12 December 2016 at 04:59, Craig Ringer  wrote:
> I didn't realise Pg's use of ? was that old, so thanks. That makes
> offering alternatives much less appealing.

One option might be for Postgres to define duplicate operator names
using ¿ or something else. I think ¿ is a good choice because it's a
common punctuation mark in spanish so it's probably not hard to find
on a lot of keyboards or hard to find instructions on how to type one.

There is always a risk in allowing redundant syntaxes though. For
example people running grep to find all uses of an operator will miss
the alternate spelling. There may even be security implications for
that though to be honest that seems unlikely in this case.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Separate connection handling from backends

2016-12-06 Thread Greg Stark

On 5 December 2016 at 19:48, Jim Nasby  wrote:
> One solution to this would be to segregate connection handling from actual
> backends, somewhere along the lines of separating the main loop from the
> switch() that handles libpq commands. Benefits:

I'm kind of mystified how a simple code restructuring could solve the
fundamental problems with a large number of backends. It sounds like
what you're describing would just push the problem around, you would
end up with some other maximum instead, max_backends, or
max_active_backends, or something like that with the same problems.
At best it would help people who have connection pooling or but few
connections active at any given time.

Heikki's work with CSN would actually address the main fundamental
problem. Instead of having to scan PGPROC when taking a snapshot
taking a snapshot would be O(1). There might need to be scans of the
list of active transactions but never of all connections whether
they're in a transaction or not.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Mail thread references in commits

2016-11-30 Thread Greg Stark

On 30 November 2016 at 16:19, Andrew Dunstan  wrote:
>
> https://www.postgresql.org/message-id/cab7npqthydyf-fo+fzvxrhz-7_hptm4rodbcsy9-noqhvet...@mail.gmail.com
>
> I'll be interested to know if it breaks anyone's MUA. If it doesn't all we
> will be arguing about are aesthetics, and I'm a firm believer in function
> over form.

I can't say I feel especially strongly either way on this but just to
toss out a small thing that might make a small difference

If you happen to know how your message-ids are generated then you
might be able to do something useful with them. For instance, you
could search all git commits for urls to messages you wrote -- for
instance any commit that has CAB7nPq is referencing a message written
by Michael Paquier.

On the other hand you could put something naughty in the message-id
forcing everyone else to use URLs with dirty words in them. Or with
words like "terrorist" in them. Or with some javascript/html injection
attack of some sort...

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNDO and in-place update

2016-11-25 Thread Greg Stark

On 24 November 2016 at 23:03, Robert Haas  wrote:
>> For snapshot isolation Oracle has yet a *third* copy of the data in a
>> space called the "rollback segment(s)".
>
> My understanding is that this isn't correct.  I think the rollback
> segments are what they call the thing that stores UNDO.  See e.g.
> http://ss64.com/ora/syntax-redo.html

It looks like you're right. Rollback segments and Undo segments are
two different pieces of code but one is just the old way and the other
the new way of managing the same data. You can't have both active in
the same database at the same time. I'm a bit confused because I
distinctly remembered an UNDO log back in the 8i days as well but
apparently that's just me imagining things. UNDO segments were
introduced in 9i.

This explained a bunch
http://satya-dba.blogspot.ie/2009/09/undo-tablespace-undo-management.html

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNDO and in-place update

2016-11-24 Thread Greg Stark

On 23 November 2016 at 04:28, Peter Geoghegan  wrote:
> On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas  wrote:
>> This basic DO-UNDO-REDO protocol has been well-understood for
>> decades.
>
> FWIW, while this is basically true, the idea of repurposing UNDO to be
> usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says
> nothing about MVCC.

Fwiw, Oracle does not use the undo log for snapshot fetches. It's used
only for transaction rollback and recovery.

For snapshot isolation Oracle has yet a *third* copy of the data in a
space called the "rollback segment(s)". When you update a row in a
block you save the whole block in the rollback segment. When you try
to access a block you check if the CSN -- which is basically
equivalent to our LSN -- is newer than your snapshot and if it is you
fetch the old version of the block from the rollback.

Essentially their MVCC is done on a per-block level rather than a
per-row level and they keep only the newest version of the block in
the table, the rest are in the rollback segment.  For what it's worth
I think our approach is cleaner and more flexible. They had a lot of
trouble with their approach over the years and it works well only
because they invested an enormous amount of development in it and also
because people throw a lot of hardware at it too.

I think the main use case we have trouble with is actually the "update
every row in the table" type of update which requires we write to
every block, plus a second copy of every block, plus write full pages
of both copies, then later set hint bits dirtying pages again and
generating more full pages writes, then later come along and vacuum
which requires two more writes of every block, etc. If we had a
solution for the special case of an update that replaces every row in
a page that I think would complement HOT nicely and go a long way
towards fixing our issues.

Incidentally the "Interested transaction list" is for locking rows for
updates and it's basically similar to what we've discussed before of
having a "most frequent xmin" in the header and then a bit indicating
the xmin is missing from the row header. Except in their case they
don't need it for the actual xmin/xmax because their visibility is
done per-block, only the transient lock state

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Do we need use more meaningful variables to replace 0 in catalog head files?

2016-11-15 Thread Greg Stark

On Tue, Nov 15, 2016 at 4:50 PM, Robert Haas  wrote:
> On Sun, Nov 13, 2016 at 9:48 AM, Andrew Dunstan  wrote:
>> I'm not convinced the line prefix part is necessary, though. What I'm
>> thinking of is something like this:
>>
>> PROCDATA( oid=1242 name=boolin isstrict=t volatile=i parallel=s nargs=1
>> rettype=bool argtypes="cstring" src=boolin );
>
> I liked Tom's format a lot better.  If we put this in a separate file
> rather than in the header, which I favor, the PROCDATA stuff is just
> noise.  On the other hand, having the name as the first thing on the
> line seems *excellent* for readability.

Just throwing this out there

It would be neat if the file format was precisely a tab or comma
separated file suitable for loading into the appropriate table with
COPY or loading into a spreadsheet. Then we might be able to maintain
it by editing the table using SQL updates and/or other tools without
having to teach them a particular input format.

The trick would then be to have a preprocessing step in the build
which loaded the CSV/TSV files into hash tables and replaced all the
strings or other tokens with OIDs and magic values.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Physical append-only tables

2016-11-14 Thread Greg Stark

On Sun, Nov 13, 2016 at 3:45 PM, Magnus Hagander  wrote:
> For a scenario like this, would it make sense to have an option that could
> be set on an individual table making it physical append only? Basically
> VACUUM would run as normal and clean up the old space when rows are deleted
> back in history, but when new space is needed for a row the system would
> never look at the old blocks, and only append to the end.

I don't think "appending" is the right way to think about this. It
happens to address the problem but only accidentally and only
partially. More generally what you have is two different kinds of data
with two different access patterns and storage requirements in the
same table. They're logically similar but have different practical
requirements.

If there was some way to teach the database that your table is made of
two different types of data and how to distinguish the two types then
when the update occurs it could move the row to the right section of
storage... This might be something the new partitioning could handle
or it might need something more low-level and implicit.

That said, I don't think the "maintain clustering a bit better using
BRIN" is a bad idea. It's just the bit about turning a table
append-only to deal with update-once data that I think is overreach.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JIT compiler for expressions

2016-10-30 Thread Greg Stark

This sounds amazing.

My only comment is that LLVM 3.7 is kind of old in the accelerated world of
LLVM. If you have patches to LLVM you need you won't have much success
submitting them as patches on 3.7.

The current stable release is 3.9 and the development snapshots are 4.0. I
know LLVM moves quickly and that makes it hard to try to track the
development. If you worked with 4.0 you might find the apis you're using
getting deprecated and rewritten several times while your project is under
development.

Re: [HACKERS] emergency outage requiring database restart

2016-10-27 Thread Greg Stark

On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure  wrote:
> I think we can rule out faulty storage

Nobody ever expects the faulty storage



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2016-10-26 Thread Greg Stark

On Wed, Oct 26, 2016 at 11:04 PM, Joshua D. Drake  
wrote:
> On 10/26/2016 12:54 PM, Josh Berkus wrote:
>> I mean, I'm not particularly in favor of using JSON for this (arrays
>> seem OK), but that seems like an invalid reason not to.
>
> -1 to JSON for this.

Sigh. Well I tried to review this patch in a previous iteration so let
me give some context.

The fundamental problem is that the pga_hba.conf file has some bits of
complex structure that aren't easily captured by linear arrays. The
problem I struggled with most was the keywords like "all", "samerole",
and "replication". A simple array of text makes it awkward to
distinguish those keywords from the quoted text values with the same
content. And then there are the ldap options which naturally would be
a data type like json or htab.

Some people wanted to store strings like '"all"' with the quotes which
I thought was ugly and functionally less useful because it would be
hard to query and impossible to join against things like pg_users.
Others wanted to give up the idea of expanding the entries at all and
just have a single string for the whole line which I thought was
pointless -- you may as well just read the file then.

Personally my recommendation was to ignore the problem. Just have
arrays of text and document that if you have a real user by the name
"all" or "samerole" then the view cannot be interpreted accurately.
Tools like pgadmin which want to use the view could check for such
users and display a warning or error rather than inaccurate
information.

If there's any support for my recommendation I'm still happy to pick
up the patch again and commit it.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-22 Thread Greg Stark

On Fri, Oct 21, 2016 at 9:03 PM, Stephen Frost  wrote:
> WARNING: The following essential packages will be removed.
> This should NOT be done unless you know exactly what you are doing!
>   login
> 0 upgraded, 0 newly installed, 1 to remove and 71 not upgraded.
> After this operation, 1,212 kB disk space will be freed.
> You are about to do something potentially harmful.
> To continue type in the phrase 'Yes, do as I say!'

Another case was:
   glxgears -iacknowledgethatthistoolisnotabenchmark

Which was required to get it to print the FPS for a while. The problem
-- and also the advantage -- of this is that it's scriptable. That
means people can still put it in recipe books and scripts and others
can copy it without being aware what it does or even that they're
doing it.

I think the apt-get behaviour was specifically designed to ensure it
couldn't easily be put into a script which I would have said was
desirable -- except I suspect there are situations where Postgres
database scripts need to do a resetxlog. I'm not sure I can think of
any examples offhand but I wouldn't be too surprised.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-10-20 Thread Greg Stark

On Oct 20, 2016 5:27 PM, "Noah Misch"  wrote:
>
> On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote:
>
> > The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
> > have convenient access to a size argument. It could call the
> > GetChunkSpace method but that will include the allocation overhead and
>
> That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of
> VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested.  Calling
> GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an
> assumption of mcxt.c, which is messy.  Including the allocation overhead
is
> fine, though.

I think the way out is to simply have aset.c make the memory
undefined/noaccess even if it's redundant under valgrind. It's a bit
unfortunate that the macros would have different semantics under the two.
If it's too confusing or we're worried about the performance overhead we
could make a MAKE_MEM_{UNDEFINED,NOACCESS}_IF_NO_MEMPOOL() but I don't
think it's worth it myself.

> > in any case isn't this memory already marked noaccess by aset.c?
>
> Only sometimes, when AllocSetFree() happens to call free() or wipe_mem().

I think if I did further surgery here I would rename wipe_mem and
randomise_mem and make them responsible for making the memory undefined and
noaccess as well. They would always be defined so that would get rid of all
the ifdefs except within those functions.

I have a patch working under asan on both gcc and clang. That handles
noaccess but not undefined memory reads. I need to try msan to make sure
the macro definitions work well for that API too.

There are a couple build oddities both with gcc and clang before I can
commit anything though. And I can't test valgrind to be sure the redundant
calls aren't causing a problem.

Re: [HACKERS] [COMMITTERS] packing/alignment annotation for ItemPointerData redux

2016-10-19 Thread Greg Stark

Sorry -- with the obvious error fixed:


$ /usr/bin/clang-4.0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -Wall -c  clang-bug.c
clang-bug.c:55:9: warning: taking address of packed member 'ip_blkid'
of class or structure
  'ItemPointerData' may result in an unaligned pointer value
[-Waddress-of-packed-member]
return ItemPointerGetBlockNumber(&ip);
   ^~
clang-bug.c:49:25: note: expanded from macro 'ItemPointerGetBlockNumber'
BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
~~~^~~~
clang-bug.c:39:19: note: expanded from macro 'BlockIdGetBlockNumber'
(BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \
 ^~~
clang-bug.c:55:9: warning: taking address of packed member 'ip_blkid'
of class or structure
  'ItemPointerData' may result in an unaligned pointer value
[-Waddress-of-packed-member]
return ItemPointerGetBlockNumber(&ip);
   ^~
clang-bug.c:49:25: note: expanded from macro 'ItemPointerGetBlockNumber'
BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
~~~^~~~
clang-bug.c:39:55: note: expanded from macro 'BlockIdGetBlockNumber'
(BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \
 ^~~
2 warnings generated.
#if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
#define pg_attribute_aligned(a) __attribute__((aligned(a)))
#define pg_attribute_packed() __attribute__((packed))
#endif

typedef unsigned BlockNumber;
typedef unsigned short uint16;
typedef uint16 OffsetNumber;

typedef struct BlockIdData
{
	uint16		bi_hi;
	uint16		bi_lo;
} BlockIdData;


typedef struct ItemPointerData
{
	BlockIdData ip_blkid;
	OffsetNumber ip_posid;
}
/* If compiler understands packed and aligned pragmas, use those */
#if defined(pg_attribute_packed) && defined(pg_attribute_aligned)
pg_attribute_packed()
pg_attribute_aligned(2)
#endif
ItemPointerData;

typedef ItemPointerData *ItemPointer;



/*
 * BlockIdGetBlockNumber
 *		Retrieve the block number from a block identifier.
 */
#define BlockIdGetBlockNumber(blockId) \
( \
	(BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \
)


/*
 * ItemPointerGetBlockNumber
 *		Returns the block number of a disk item pointer.
 */
#define ItemPointerGetBlockNumber(pointer) \
( \
	BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
)

int main() {
	ItemPointerData ip;

	return ItemPointerGetBlockNumber(&ip);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] packing/alignment annotation for ItemPointerData redux

2016-10-19 Thread Greg Stark

Ah. Here we go:

$ /usr/bin/clang-4.0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -Wall -c  clang-bug.c
clang-bug.c:54:9: error: use of undeclared identifier 'BlockNumber'
return ItemPointerGetBlockNumber(&ip);
   ^
clang-bug.c:48:2: note: expanded from macro 'ItemPointerGetBlockNumber'
BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
^
clang-bug.c:38:3: note: expanded from macro 'BlockIdGetBlockNumber'
(BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \
 ^
1 error generated.


Preprocessor output:

# 1 "clang-bug.c"
# 1 "" 1
# 1 "" 3
# 317 "" 3
# 1 "" 1
# 1 "" 2
# 1 "clang-bug.c" 2





typedef unsigned short uint16;
typedef uint16 OffsetNumber;

typedef struct BlockIdData
{
 uint16 bi_hi;
 uint16 bi_lo;
} BlockIdData;


typedef struct ItemPointerData
{
 BlockIdData ip_blkid;
 OffsetNumber ip_posid;
}


__attribute__((packed))
__attribute__((aligned(2)))

ItemPointerData;

typedef ItemPointerData *ItemPointer;
# 51 "clang-bug.c"
int main() {
 ItemPointerData ip;

 return ( ( (BlockNumber) (((&(&ip)->ip_blkid)->bi_hi << 16) |
((uint16) (&(&ip)->ip_blkid)->bi_lo)) ) );
}
#if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__)
#define pg_attribute_aligned(a) __attribute__((aligned(a)))
#define pg_attribute_packed() __attribute__((packed))
#endif

typedef unsigned short uint16;
typedef uint16 OffsetNumber;

typedef struct BlockIdData
{
	uint16		bi_hi;
	uint16		bi_lo;
} BlockIdData;


typedef struct ItemPointerData
{
	BlockIdData ip_blkid;
	OffsetNumber ip_posid;
}
/* If compiler understands packed and aligned pragmas, use those */
#if defined(pg_attribute_packed) && defined(pg_attribute_aligned)
pg_attribute_packed()
pg_attribute_aligned(2)
#endif
ItemPointerData;

typedef ItemPointerData *ItemPointer;



/*
 * BlockIdGetBlockNumber
 *		Retrieve the block number from a block identifier.
 */
#define BlockIdGetBlockNumber(blockId) \
( \
	(BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \
)


/*
 * ItemPointerGetBlockNumber
 *		Returns the block number of a disk item pointer.
 */
#define ItemPointerGetBlockNumber(pointer) \
( \
	BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
)

int main() {
	ItemPointerData ip;

	return ItemPointerGetBlockNumber(&ip);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] packing/alignment annotation for ItemPointerData redux

2016-10-19 Thread Greg Stark

On Wed, Oct 19, 2016 at 5:20 PM, Tom Lane  wrote:
> Don't know how that version number compares to "3.8.0".

Argh. Just to further confuse matters apparently the warnings are from
clang 4.0. I had been testing with 3.8.0 earlier but updated at some
point.

And I'm not being able to reproduce them with a minimal test case yet.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] packing/alignment annotation for ItemPointerData redux

2016-10-19 Thread Greg Stark

[resending to -hackers sorry]

Back in 2001 a hack to add __attribute__((packed)) to ItemPtr was
added with a comment "Appropriate whack upside the head for ARM"
(dcbbdb1b3ee). I don't know if this is still a factor in 2016 or not
but it has already resulted in some collateral damage in 2015 when
some compiler took that as license to align the whole struct on single
byte alignment when it was buried inside another struct
(d4b538ea367de).

I just tried compiling with Clang 3.8.0 and got tons of warnings about
this because:

  'ItemPointerData' may result in an unaligned pointer value
[-Waddress-of-packed-member]
  ...ItemPointerGetBlockNumber(&(xlrec->target_tid)),
 ^~~
../../../src/include/storage/itemptr.h:69:25: note: expanded from macro
  'ItemPointerGetBlockNumber'
BlockIdGetBlockNumber(&(pointer)->ip_blkid) \
~~~^~~~
../../../src/include/storage/block.h:118:19: note: expanded from macro
'BlockIdGetBlockNumber'
(BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \
 ^~~

Which seems to indicate that clang may not understand the
"pg_attribute_aligned(2)" or perhaps it does and just doesn't take it
into account when generating these warnings.

I'm sure there are other people testing clang -- isn't it the default
on MacOS? Do they not see these warnings?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-10-19 Thread Greg Stark

On Sat, Feb 6, 2016 at 4:52 AM, Noah Misch  wrote:
> aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit
> VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit
> VALGRIND_MAKE_MEM_NOACCESS().  #define those two accordingly.  If ASAN has no

Actually this is confusing.

aset.c doesn't actually use the MEMPOOL_ALLOC macro at all, it just
calls UNDEFINED, DEFINED, and NOACCESS. mcxt.c does however do the
MEMPOOL_ALLOC/FREE. So both layers are calling these macros for
overlapping memory areas which I find very confusing and I'm not sure
what the net effect is.

The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
have convenient access to a size argument. It could call the
GetChunkSpace method but that will include the allocation overhead and
in any case isn't this memory already marked noaccess by aset.c?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers

2016-10-12 Thread Greg Stark

On Mon, Oct 10, 2016 at 9:52 PM, Greg Stark  wrote:
>
> The code is here:
>
> https://github.com/gsstark/retropg
>
> The build script is called "makeall" and it applies patches from the
> "old-postgres-fixes" directory though some of the smarts are in the
> script because it knows about how to run older version of the
> configure script and it tries to fix up the ecpg parser duplcate
> tokens separately. It saves a diff after applying the patches and
> other fixups into the "net-diffs" directory but I've never checked if
> those diffs would work cleanly on their own.


Fwiw I was considering proposing committing some patches for these old
releases to make them easier to build. I would suggest creating a tag
for a for this stable legacy version and limiting the commits to just:

1) Disabling warnings
2) Fixing bugs in the configure scripts that occur on more recent
systems (version number parsing etc)
3) Backporting things like the variable-length array code that prevents building
4) Adding compiler options like -fwrapv


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers

2016-10-10 Thread Greg Stark

On Mon, Oct 10, 2016 at 3:36 AM, Jim Nasby  wrote:
> FWIW, Greg Stark did a talk at PG Open about PG performance going back to at
> least 7.4. He did discuss what he had to do to get those versions to compile
> on modern tools, and has a set of patches that enable it. Unfortunately his
> slides aren't posted[1] so I can't provide further details than that.

The code is here:

https://github.com/gsstark/retropg

The build script is called "makeall" and it applies patches from the
"old-postgres-fixes" directory though some of the smarts are in the
script because it knows about how to run older version of the
configure script and it tries to fix up the ecpg parser duplcate
tokens separately. It saves a diff after applying the patches and
other fixups into the "net-diffs" directory but I've never checked if
those diffs would work cleanly on their own.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2016-10-07 Thread Greg Stark

On Fri, Oct 7, 2016 at 3:06 PM, Tom Lane  wrote:
> pg_dump alleges support for dumping from servers back to 7.0.  Would v10
> be a good time to remove some of that code?  It's getting harder and
> harder to even compile those ancient branches, let alone get people to
> test against them (cf. 4806f26f9).  My initial thought is to cut support
> for pre-7.3 or maybe pre-7.4 servers, as that would allow removal of
> support for cases where the server lacks schemas or pg_depend, each of
> which requires a fair deal of klugery in pg_dump.

I might be expected to be the holdout here but it seems sensible to
me. Removing code in pg_dump to deal with lacking schemas and
pg_depend seems like a major simplification.

> In the same line, maybe we should kill libpq's support for V2 protocol
> (which would make the cutoff 7.4).  And maybe the server's support too,
> though that wouldn't save very much code.  The argument for cutting this
> isn't so much that we would remove lots of code as that we're removing
> code that never gets tested, at least not by us.

If it's testing we're concerned about IIRC the current servers can be
arm-twisted into speaking V2 protocol. So it should be possible to
test both modern servers and modern pg_dump using V2 protocol with a
simple tweak.

Somehow removing the whole protocol support seems a bit different to
me than removing pg_dump logic. For one it's nice to be able to run a
modern psql against old servers so you can run a benchmark script. For
another there may be binary-only applications or drivers out there
that are using V2 for whatever reason.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Macro customizable hashtable / bitmapscan & aggregation perf

2016-10-01 Thread Greg Stark

On Sat, Oct 1, 2016 at 1:44 AM, Andres Freund  wrote:
>
> Unfortunately I'm running out battery right now, so I don't want to
> re-run the benchmarks posted upthread (loading takes a while). But the
> last time I ran them all the results after the patches were better than
> before.


I have a machine sitting idle now too if you have specific ideas of
what to benchmark.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-10-01 Thread Greg Stark

On Fri, Sep 30, 2016 at 2:11 AM, Robert Haas  wrote:
> For one thing, we can stop shipping a totally broken feature in release after 
> release

For what it's worth I'm for any patch that can accomplish that.

In retrospect I think we should have done the hash-over-btree thing
ten years ago but we didn't and if Amit's patch makes hash indexes
recoverable today then go for it.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Order of operations in SubPostmasterMain()

2016-09-29 Thread Greg Stark

On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane  wrote:
> We could probably refactor things enough so that we do pq_init()
> before PGSharedMemoryReAttach().  It would be a little bit ugly,
> and it would fractionally increase the chance of a reattach failure
> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
> sure if it's worth it; thoughts?  In any case the mentioned comments
> are obsolete and need to be moved/rewritten.

Just speaking off the cuff without reviewing the code in detail...

Alternately we could call pq_init in the error path if it hasn't been
called yet. I'm sure there are problems with doing that in general but
for the specific errors that can happen before pq_init it might be
feasible since they obviously can't have very much shared state yet to
have corrupted.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-28 Thread Greg Stark

On Wed, Sep 28, 2016 at 7:40 AM, Piotr Stefaniak
 wrote:
> Not remembering the context, I was initially confused about what exactly
> supposedly needs to be done in order to have ASan support, especially
> since I've been using it for a couple of years without any kind of
> modifications. Having read the whole thread now, I assume the discussion
> is now about getting MSan support, since apparently it's been already
> concluded that not much is needed for getting ASan support:

The differnce between msan and asan is only related to whether
uninitialized reads are tracked. All other memory errors such as
reading past the end of an allocation or reading after a free are
tracked by both.

Without asan support in Postgres's memdebug.h asan will just track
whether you're using memory that is outside the memory that malloc has
handed Postgres. It doesn't know anything about whether that memory
has been returned by palloc or has since been pfree'd. Even the bounds
checking is not great since you could be reading from palloc's header
or from the bytes in the next palloc block that happened to be
returned by the same malloc (or another malloc if you're unlucky).

The support I added to memdebug.h called macros which call llvm
intrinsics to mark the memory malloc'd by Postgres as unusuable until
it's returned by palloc. Once it's returned by palloc it's marked
usable except for a "guard" byte at the end. Then pfree marks the
memory unusable again. This basically mimics the behaviour you would
get from asan if you were using malloc directly.

I added support for msan as well which is basically just one more
macro to make the distinction between usable but uninitialized memory
and usable and initialized memory. But I was unable to test it because
msan didn't work for me at all. This seems to be the way of things
with llvm. It's great stuff but there's always 10% that is broken
because there's some cool new thing that's better.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-09-27 Thread Greg Stark

On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund  wrote:
> Any plans to pick this up again?

Yeah, I was just thinking I should pick this up again.

> I vote for renaming the VALGRIND names etc. to something more tool-neutral. I 
> think it's going to be too confusing otherwise.

Hm, the danger there is once I start refactoring things I could get
bogged down... I would love to remove all the #ifdef's and have the
macros just be no-ops if they're compiled out for example...

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-24 Thread Greg Stark

On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane  wrote:
> But to kick the hash AM as such to the curb is to say
> "sorry, there will never be O(1) index lookups in Postgres".

Well there's plenty of halfway solutions for that. We could move hash
indexes to contrib or even have them in core as experimental_hash or
unlogged_hash until the day they achieve their potential.

We definitely shouldn't discourage people from working on hash indexes
but we probably shouldn't have released ten years worth of a feature
marked "please don't use this" that's guaranteed to corrupt your
database and cause weird problems if you use it a any of a number of
supported situations (including non-replicated system recovery that
has been a bedrock feature of Postgres for over a decade).

Arguably adding a hashed btree opclass and relegating the existing
code to an experimental state would actually encourage development
since a) Users would actually be likely to use the hashed btree
opclass so any work on a real hash opclass would have a real userbase
ready and waiting for delivery, b) delivering a real hash opclass
wouldn't involve convincing users to unlearn a million instructions
warning not to use this feature and c) The fear of breaking existing
users use cases and databases would be less and pg_upgrade would be an
ignorable problem at least until the day comes for the big cutover of
the default to the new opclass.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-09 Thread Greg Stark

On Thu, Sep 8, 2016 at 9:29 AM, Andrew Borodin  wrote:
>>autoconf check for IEEE 754 floats
> Autoconf man says folowing:
>>it is safe to assume IEEE-754 in most portable code these days
> https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability

Personally I wouldn't be very happy about an IEEE754 assumption. I did
go to the trouble of testing Postgres on a VAX and we fixed the few
instances where it had such dependencies for a net gain. If we
intentionally put a dependency in in one place then we'll never be
able to determine anywhere else where there are unintentional
dependencies.

I haven't followed the thread closely but I'm puzzled why you would
need to use bit twiddling to set a floating point number. Isn't there
a perfectly good way to calculate the value you want using ldexp() and
other standard C library functions?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Greg Stark

On Fri, Sep 9, 2016 at 1:01 PM, Heikki Linnakangas  wrote:
> I'm happy with what it looks like. We are in fact getting a more sequential
> access pattern with these patches, because we're not expanding the pre-read
> tuples into SortTuples. Keeping densely-packed blocks in memory, instead of
> SortTuples, allows caching more data overall.

Wow, this is really cool. We should do something like this for query
execution too.

I still didn't follow exactly why removing the prefetching allows more
sequential i/o. I thought the whole point of prefetching was to reduce
the random i/o from switching tapes.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-07 Thread Greg Stark

On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs  wrote:
> On 6 September 2016 at 19:59, Tom Lane  wrote:
>
>> The idea of looking to the stats to *guess* about how many tuples are
>> removable doesn't seem bad at all.  But imagining that that's going to be
>> exact is folly of the first magnitude.
>
> Yes.  Bear in mind I had already referred to allowing +10% to be safe,
> so I think we agree that a reasonably accurate, yet imprecise
> calculation is possible in most cases.

That would all be well and good if it weren't trivial to do what
Robert suggested. This is just a large unsorted list that we need to
iterate throught. Just allocate chunks of a few megabytes and when
it's full allocate a new chunk and keep going. There's no need to get
tricky with estimates and resizing and whatever.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_sequence catalog

2016-09-04 Thread Greg Stark

On Wed, Aug 31, 2016 at 3:01 PM, Tom Lane  wrote:
>
> Uh, not as subtly as all that, because "select * from sequence" will
> now return a different set of columns, which will flat out break a
> lot of clients that use that method to get sequence properties.

So? Clients expect changes like this between major releases surely.
Subtle changes that cause silent breakage for end-users seems scarier
than unsubtle breakage that tool authors can fix.

On Wed, Aug 31, 2016 at 7:30 PM, Andres Freund  wrote:
> Isn't that *precisely* when it's going to matter? If you have 5 active
> tables & sequences where the latter previously used independent locks,
> and they'd now be contending on a single lock.  If you have hundreds of
> thousands of active sequences, I doubt individual page locks would
> become a point of contention.

I think even two sequences could be a point of contention if you, for
example, are using COPY to load data into two otherwise completely
independent tables in two separate processes.

But that just means the row needs to be padded out to a cache line,
no? Or was the concern about things like trying to pin the same page,
parse the same page header, follow nearby line pointers...? I'm not
sure how effective all that caching is today but it doesn't seem
impossible to imagine getting that all optimized away.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-09-03 Thread Greg Stark

On Sat, Sep 3, 2016 at 12:09 PM, Simon Riggs  wrote:
> So doing autonomous transactions inside a single backend doesn't gain
> you very much, yet it is an enormously invasive patch to do it that
> way, not least because it requires you to rewrite locking and
> deadlocks to make them work correctly when proc is not 1:1 with xid.
> And as Serge points out it introduces other restrictions that we know
> about now, perhaps more as well.

Well using a separate process also requires rewriting locking and
deadlock detection since a reasonable user might expect that second
process to have access to data locked in their current transaction.The
plus side is that we're already facing that issue with parallel query
so at least it's something that only has to be solved once instead of
a new problem.

Parallel query is currently restricted to read-only queries however.
Autonomous transactions will certainly need to be read-write so the
question then is what problems led to the restriction in parallel
query and are they any more tractable with autonomous transactions?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-08-31 Thread Greg Stark

On Wed, Aug 31, 2016 at 3:11 PM, Craig Ringer  wrote:
>
> I suspect that there'll be way too much code that relies on stashing
> xact-scoped stuff in globals for that to be viable. Caches alone.
> Peter will be able to explain more, I'm sure.
>
> We'd probably need a new transaction data object that everything
> xact-scoped hangs off, so we can pass it everywhere or swap it out of
> some global. The mechanical refactoring alone would be pretty scary,
> not to mention the complexity of actually identifying all the less
> obvious places that need changing.

Well this is the converse of the same problem. Today process state and
transaction are tied together. One way or another you're trying to
split that -- either by having two processes share state or by having
one process manage two transactions.

I suppose we already have the infrastructure for parallel query so
there's at least some shared problem space there.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-08-31 Thread Greg Stark

On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut
 wrote:
> - A API interface to open a "connection" to a background worker, run
> queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(),
> AutonomousSessionExecute(), etc.  The communication happens using the
> client/server protocol.
>

I'm surprised by the background worker. I had envisioned autonomous
transactions being implemented by allowing a process to reserve a
second entry in PGPROC with the same pid. Or perhaps save its existing
information in a separate pgproc slot (or stack of slots) and
restoring it after the autonomous transaction commits.

Using a background worker mean that the autonomous transaction can't
access any state from the process memory. Parameters in plpgsql are a
symptom of this but I suspect there will be others. What happens if a
statement timeout occurs during an autonomous transaction? What
happens if you use a pl language in the autonomous transaction and if
it tries to use non-transactional information such as prepared
statements?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pinning a buffer in TupleTableSlot is unnecessary

2016-08-30 Thread Greg Stark

On Tue, Aug 30, 2016 at 11:12 AM, Heikki Linnakangas  wrote:
> While profiling some queries and looking at executor overhead, I realized
> that we're not making much use of TupleTableSlot's ability to hold a buffer
> pin. In a SeqScan, the buffer is held pinned by the underlying heap-scan
> anyway. Same with an IndexScan, and the SampleScan. The only thing that
> relies on that feature is TidScan, but we could easily teach TidScan to hold
> the buffer pin directly.
>
> So, how about we remove the ability of a TupleTableSlot to hold a buffer
> pin, per the attached patch? It shaves a few cycles from a ExecStoreTuple()
> and ExecClearTuple(), which get called a lot. I couldn't measure any actual
> difference from that, though, but it seems like a good idea from a
> readability point of view, anyway.

Out of curiosity why go in this direction and not the other? Why not
move those other scans to use the TupleTableSlot API to manage the
pins. Offhand it sounds more readable not less to have the
TupleTableSlot be a self contained data structure that guarantees the
lifetime of the pins matches the slots rather than have a dependency
on the code structure in some far-away scan.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix some corner cases that cube_in rejects

2016-08-29 Thread Greg Stark

On Mon, Aug 29, 2016 at 7:19 PM, Tom Lane  wrote:
> To deal with the infinity/NaN issues, I made cube_in and cube_out rely
> on float8in_internal and float8out_internal, as we recently did for the
> core geometric types.  That causes the response to "1e-700" to be an
> out-of-range error rather than silent underflow, which seems to me to
> be fine, especially since it removes the platform dependency that's
> responsible for needing the separate cube_1.out and cube_3.out files.

So what happens to a database that has invalid cubes in it already?
Offhand I suspect they mostly become valid since float8in will handle
NaN and the like.

Incidentally, I so wish this module were named "vector" instead of
cube. I don't think I was confused by it for ages and still find it
confuses me for a few moments before I remember what it does.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-23 Thread Greg Stark

On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev
 wrote:
> Frankly I have much more faith in Tom's idea of using virtual part of the
> catalog for all temporary tables, i.e turning all temporary tables into
> "fast" temporary tables. Instead of introducing a new type of temporary tables
> that solve catalog bloating problem and forcing users to rewrite applications
> why not just not to create a problem in a first place?

Would applications really need to be rewritten? Are they really
constructing temporary tables where the definition of the table is
dynamic, not just the content? I think application authors would be
pretty happy to not need to keep recreating the same tables over and
over again and dealing with DDL in their run-time code. It's not
really rewriting an application to just remove that DDL and move it to
the one-time database schema creation.

I think it's clear we got the idea of temporary tables wrong when we
implemented them and the SQL standard is more useful. It's not just
some implementation artifact that it's possible to implement them in
an efficient way. It's a fundamental design change and experience
shows that separating DDL and making it static while the DML is
dynamic is just a better design.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Are these supported??

2016-08-17 Thread Greg Stark

On Wed, Aug 17, 2016 at 4:22 PM, Robert Haas  wrote:
> We've supported having joins in a DELETE since PostgreSQL 8.1.

Er, yes. Though he does say he's trying to use the same syntax as select...

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Are these supported??

2016-08-17 Thread Greg Stark

On Wed, Aug 17, 2016 at 4:33 AM, Vince Vielhaber  
wrote:
>
> I recently moved a mybb forum away from mysql to postgres.  Along the way I
> encountered a couple of things that either didn't seem to be supported or
> I'm just not doing it right.
>
> First, the server this is on is running version 8.4.22.  php is 5.6.22.

8.4 is very old. It's been unsupported for two years already.

You can't have joins in a DELETE -- which table would it actually
delete from? You can use a subselect to do look up information from
other tables in your delete though.

EXTRACT and date_part have no idea where the data they're passed came
from. They can come from tables or other functions or expressions. The
error you quoted is indicating that dateline is of type integer
however. The syntax for EXTRACT is confusing (blame the SQL
committee...) but you don't want the extra "timestamp" keyword before
the column there -- in the examples that's part of the literal being
used to have it be read as a timestamp.

https://www.postgresql.org/docs/8.4/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-17 Thread Greg Stark

On Wed, Aug 17, 2016 at 1:36 AM, Jim Nasby  wrote:
> Something I didn't see mentioned that I think is a critical point: last I
> looked, HOT standby (and presumably SR) replays full page writes. That means
> that *any* kind of corruption on the master is *guaranteed* to replicate to
> the slave the next time that block is touched. That's completely the
> opposite of trigger-based replication.

Yes, this is exactly what it should be doing and exactly why it's
useful. Physical replication accurately replicates the data from the
master including "corruption" whereas a logical replication system
will not, causing divergence and possible issues during a failover.

Picture yourself as Delta, you have a fire in your data centre and go
to fail over to your secondary site. Your DBAs inform you that the
secondary site has "fixed" some corruption that you were unaware of
and wasn't causing any issues and now, in the middle of the business
crisis, is when you're going to need to spend time identifying and
repairing the problem because your business logic has suddenly started
running into problems.

Physical replication tries to solve the same use cases as physical
backups. They both provide you with exactly what you had prior to the
recovery. No more or less. That's what you want when recovering from a
disaster.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-16 Thread Greg Stark

On Tue, Aug 16, 2016 at 10:15 AM, Craig Ringer  wrote:
> I'm surprised the 32-bit xid was ever exposed to the user, rather than a
> 64-bit epoch-extended xid.

Once upon a time we didn't have epoch counting at all.

I don't think it would be a bad idea to clean up everything to do with
xids so that everything user-facing is epoch-aware. Of course you
don't always have the epoch but if we're careful about where users can
see xids they should never see an xid from an old epoch. That could be
a problem for internal tools like pageinspect or xlogdump but
shouldn't be a problem for any real production api.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSA: Systemd will kill PostgreSQL

2016-08-16 Thread Greg Stark

On Tue, Aug 16, 2016 at 12:41 AM, Josh Berkus  wrote:
> Presumably people just need to add the system account tag to the unit
> file, no?

That's a system level change though. How would a normal user manage this?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs

2016-08-16 Thread Greg Stark

On Mon, Aug 15, 2016 at 11:45 PM, Peter Eisentraut
 wrote:
> If we instead installed
>
> libpq.so.5 (actual file)
> libpq.so -> libpq.so.5
>
> nothing would effectively change.

It would make it impossible to have multiple versions installed. One
doesn't normally have multiple versions with the same soname installed
since the linker will only be actually using one of them anyways but
it makes it a lot easier to manage transitions and debugging. I don't
think it would be very standard to not have a minor version at all.

Also, the above is only true for Linux. Other operating systems may
set things up differently. IIRC NetBSD did do something a lot more
sophisticated where it linked to the correct minor version or gave a
warning and linked with a newer minor version but refused to link to
older minor versions even if it was the right major version -- or
something like that.

But I think Tom was not proposing dropping the minor version at all,
just setting the minor version equal to the release. So the our soname
would be libpq.so.5.10 then libpq.so.5.11 and so on until we make an
incompatible change and then it would become libpq.so.6.12 for
example. I think it would be a bit confusing to see both the minor and
major version bump and to see a major version start at a high number.
But sonames are pretty technical internal goo and it's not too bad. I
think it would be a reasonable idea.

It does rule out the possibility of having a minor bump in the soname
for a point-release, which as you point out wouldn't do much on Linux
but on other operating systems might be a useful thing.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Greg Stark

On Sat, Aug 13, 2016 at 1:18 AM, Andrew Gierth
 wrote:
>
> Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to
> agree with this.
>
> (Inserting on the high leaf page is a special case, which is where the
> fillfactor logic kicks in; that's why sequentially filled indexes are
> (by default) 90% full rather than 100%. But other pages split into
> roughly equal halves.)

Hm, I was going from this lore. I didn't realize it was only for
inserts near the end of the index. That's cleverer than I realized.

commit 1663f3383849968415d29965ef9bfdf5aac4d358
Author: Tom Lane 
Date:   Sat Sep 29 23:49:51 2001 +

Tweak btree page split logic so that when splitting a page that is
rightmost on its tree level, we split 2/3 to the left and 1/3 to the
new right page, rather than the even split we use elsewhere.  The idea
is that when faced with a steadily increasing series of inserted keys
(such as sequence or timestamp values), we'll end up with a btree that's
about 2/3ds full not 1/2 full, which is much closer to the desired
steady-state load for a btree.  Per suggestion from Ann Harrison of
IBPhoenix.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2942 matches

Mail list logo