[HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
On 31 October 2017 at 07:05, Chris Travers wrote: > Hi; > > After Andres's excellent talk at PGConf we tried benchmarking > effective_io_concurrency on some of our servers and found that those which > have a number of NVME storage volumes could not fill the I/O queue even at > the maximum setting (1000). And was the system still i/o bound? If the cpu was 100% busy then perhaps Postgres just can't keep up with the I/O system. It would depend on workload though, if you start many very large sequential scans you may be able to push the i/o system harder. Keep in mind effective_io_concurrency only really affects bitmap index scans (and to a small degree index scans). It works by issuing posix_fadvise() calls for upcoming buffers one by one. That gets multiple spindles active but it's not really going to scale to many thousands of prefetches (and effective_io_concurrency of 1000 actually means 7485 prefetches). At some point those i/o are going to start completing before Postgres even has a chance to start processing the data. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Current int & float overflow checking is slow.
We already know this integer overflow checking is non-standard and compilers keep trying to optimize them out. Our only strategy to defeat that depends on compiler flags like -fwrapv that vary by compiler and may or may not be working on less well tested compiler. So if there's a nice readable and convenient way to portably use cpu flags That would be brilliant. And I'm not too concerned if it doesn't run on VAX. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
On 10 October 2017 at 23:50, Stephen Frost wrote: > Yeah, it sounds interesting, but I was just chatting w/ David about it > and we were thinking about how checkpoints are really rather often done, > so you end up with quite a few of these lists being out there. > > Now, if the lists were always kept in a sorted fashion, then perhaps we > would be able to essentially merge-sort them all back together and > de-dup that way but even then, you're talking about an awful lot if > you're looking at daily incrementals- that's 288 standard 5-minute > checkpoints, each with some 128k pages changed, assuming max_wal_size of > 1GB, and I think we can all agree that the default max_wal_size is for > rather small systems. That ends up being something around 2MB per > checkpoint to store the pages in or half a gig per day just to keep > track of the pages which changed in each checkpoint across that day. I was actually imagining a bitmap, probably for each 1GB piece of each table. That's probably how you would maintain this data in memory anyways. After compression it should be fairly small. You'll probably be modifying the same blocks frequently or doing bulk loads which will touch a consecutive range of blocks. But that's still about the same amount of data. But probably you don't want to actually keep every checkpoint anyways. The nice thing about the changelists is that they will tend to reach a maximum size regardless of how long a time range they span so if you keep one changelist for every 10 checkpoints or every 100 checkpoints you could reduce the storage needs and only lose the time precision. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On markers of changed data
On 8 October 2017 at 08:52, Andrey Borodin wrote: > > 1. Any other marker would be better (It can be WAL scan during archiving, > some new LSN-based mechanics* et c.) The general shape of what I would like to see is some log which lists where each checkpoint starts and ends and what blocks are modified since the previous checkpoint. Then to generate an incremental backup from any point in time to the current you union all the block lists between them and fetch those blocks. There are other ways of using this aside from incremental backups on disk too -- you could imagine a replica that has fallen behind requesting the block lists and then fetching just those blocks instead of needing to receive and apply all the wal. Or possibly even making a cost-based decision between the two depending on which would be faster. It would also be useful for going in the reverse direction: look up all the records (or just the last record) that modified a given block. Instead of having to scan all the wal you would only need to scan the checkpoint eras that had touched that block. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generated columns
There are some unanswered questions with column grants too. Do we allow granting access to a calculated column which accesses columns the user doesn't have access to? If so then this is a suitable substitute for using updateable views to handle things like granting users access to things like password hashes or personal data with details censored without giving them access to the unhashed password or full personal info. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On 1 October 2017 at 16:40, Tom Lane wrote: > Greg Stark writes: >> Indeed. It's simple enough to export stats to prometheus with queryid >> as the key. Then even if the query ages out of the database stats you >> have graphs and derivative metrics going further back. > > I'm not really ready to buy into that as a supported use-case, because > it immediately leads to the conclusion that we need to promise stability > of query IDs across PG versions, which seems entirely infeasible to me > (at least with anything like the current method of deriving them). Well these kinds of monitoring systems tend to be used by operations people who are a lot more practical and a lot less worried about theoretical concerns like that. What they're going to want to do is things like "alert on any query using more than 0.1 cpu core" or "alert on any query being the top i/o consumer that isn't amongst the top 10 over the previous day" or "alert on any query using more than 4x the cpu or i/o on one database than it does on average across all our databases". That last one is a case where it would be nice if the queryid values were stable across versions but it's hardly surprising that they aren't. In context the point was merely that the default pg_stat_statements.max of 5000 isn't sufficient to argue that 32-bit values are enough. It wouldn't be hard for there to be 64k different queries over time and across all the databases in a fleet and at that point it becomes likely there'll be a 32-bit collision. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On 30 September 2017 at 21:03, Alexander Korotkov wrote: > I heard from customers that they periodically dump contents of > pg_stat_statements and then build statistics over long period of time. If > even they leave default pg_stat_statements.max unchanged, probability of > collision would be significantly higher. Indeed. It's simple enough to export stats to prometheus with queryid as the key. Then even if the query ages out of the database stats you have graphs and derivative metrics going further back. I have to admit this was my first reaction to the idea of using sha1 hashes for git commits as well. But eventually I came around. If the chances of a hash collision are smaller than a cosmic ray flipping a bit or a digital electronics falling into a meta-stable state then I had to admit there's not much value in being theoretically more correct. In practice if the query has aged out of pg_stat_statements and you're exporting pg_stat_statements metrics to longer-term storage there's really nothing more "correct" than just using a long enough hash and assuming there are no collisions anyways. If 64-bits is still not sufficient we could just go to 160-bit sha1 or 256-bit sha256. Actually there's a reason I'm wondering if we shouldn't use a cryptographic hash or even an HMAC. Currently if you're non-superuser we, quite sensibly, hide the query text. But we also hide the queryid. The latter is really inconvenient since it really makes the stats utterly useless. I'm not sure what the rationale was but the only thing I can think of is a fear that it's possible to reverse engineer the query using brute force. An HMAC, or for most purposes even a simple cryptographic hash with a secret salt would make that impossible. (I have other advances in pg_stat_statements I would love to see. It would be so much more helpful if pg_stat_statements also kept a few examples of query parameters such as the most recent set, the set that caused the longest execution, maybe the set with the largest of each metric.) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case for removing replacement selection sort
On 8 September 2017 at 18:06, Peter Geoghegan wrote: > * It's still faster with int4/int8 comparisons on modern hardware, and > I think that most ordered inputs will be of those types in practice. This may be a bit "how long is a piece of string" but how do those two compare with string sorting in an interesting encoding/locale -- say /usr/share/dict/polish in pl_PL for example. It's certainly true that people do sort text as well as numbers. Also, people often sort on keys of more than one column -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JIT compiling expressions/deform + inlining prototype v2.0
On 5 September 2017 at 11:58, Konstantin Knizhnik wrote: > > I wonder if we can perform some optimization in this case (assuming that in > typical cases column either contains mostly non-null values, either mostly > null values). If you really wanted to go crazy here you could do lookup tables of bits of null bitmaps. Ie, you look at the first byte of the null bitmap, index into an array and it points to 8 offsets for the 8 fields covered by that much of the bitmap. The lookup table might be kind of large since offsets are 16-bits so you're talking 256 * 16 bytes or 2kB for every 8 columns up until the first variable size column (or I suppose you could even continue in the case where the variable size column is null). -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CSV Logging questions
I was just looking over the CSV logging code and have a few questions about why things were done the way they were done. 1) Why do we gather a per-session log line number? Is it just to aid people importing to avoid duplicate entries from partial files? Is there some other purpose given that entries will already be sequential in the csv file? 2) Why is the file error conditional on log_error_verbosity? Surely the whole point of a structured log is that you can log everything and choose what to display later -- i.e. why csv logging doesn't look at log_line_prefix to determine which other bits to display. There's no added cost to include this information unconditionally and they're far from the largest piece of data being logged either. 3) Similarly I wonder if the statement should always be included even with hide_stmt is set so that users can write sensible queries against the data even if it means duplicating data. 4) Why the session start time? Is this just so that uniquely identiifes a session? Should we perhaps generate a unique session identifier instead? The real reason I'm looking at this is because I'm looking at the json_log plugin from Michael Paquier. It doesn't have the log line numbers and I can't figure whether this is something it should have because I can't quite figure out why they exist in CSV files. I think there are a few other fields that have been added in Postgres but are missing from the JSON log because of version skew. I'm wondering if we should abstract out the CSV format so instead of using emit_log_hook you would add a new format and it would specify a "add_log_attribute(key,val)" hook which would get called once per log format so you could have as many log formats as you want and be sure they would all have the same data. That would also mean that the timestamps would be in sync and we could probably eliminate the occurrences of the wrong format appearing in the wrong logs. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_destination=file
On 31 August 2017 at 13:49, Tom Lane wrote: > Magnus Hagander writes: >> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane wrote: > Yes, it's pretty important, because of assorted stuff not-under-our- > control that doesn't know about ereport and will just print to stderr > anyway. Some examples: dynamic linker can't-resolve-symbol failure > messages, glibc malloc corruption error messages, just about any external > module in plperl or plpython. I don't find this to be negotiable. So what happens now with these messages? My understanding is that they're missing from the CSV logs and are simply inserted into the text logs without any log_line_prefix? The logging collector doesn't recognize these messages and reformat them for the CSV logs does it? I'm actually asking because I'm more concerned with JSON logs or msgpack logs. Currently these are supported with an emit_log_hook but they can't capture these non-ereport logs either. Also the CSV and emit_log_hook based logs don't have any convenient way to turn them on and off and control the location and filename of the logs. It would be nice if we could have something like log_destinations='stderr=text,syslog=json,postgresql-%Y-%m-%d_%H%M%S.csv=csv' >> Are you actually asking for a benchmark of if logging gets slower? > > Yes. Personally I don't think it's "performance" so much as operational issues that are more concerning. For all we know there are people out there who tried to use the logging collector and found it didn't work well on some system -- perhaps it interacted with systemd or something else on the system -- and they switched back to just using stderr. I don't know how to flush these users out though if there are any. Just making this change early in a release cycle is the best we can do. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] signed logging format for pid in log_line_prefix?
Both the text and csv logging seem to use %d on for logging the server pid: appendStringInfo(buf, "%d", MyProcPid); Am I missing something or wouldn't this mean we print pids with large values as negative numbers? Isn't that strange? Wouldn't we rather use %u here? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generated columns
On 31 August 2017 at 05:16, Peter Eisentraut wrote: > Here is another attempt to implement generated columns. This is a > well-known SQL-standard feature, also available for instance in DB2, > MySQL, Oracle. A quick example: > > CREATE TABLE t1 ( > ..., > height_cm numeric, > height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) > ); I only recently discovered we actually already have this feature. Kind of. stark=# CREATE TABLE t1 (height_cm numeric); CREATE TABLE Time: 38.066 ms stark***=# create function height_in(t t1) returns numeric language 'sql' as 'select t.height_cm * 2.54' ; CREATE FUNCTION Time: 1.216 ms stark***=# insert into t1 values (2); INSERT 0 1 Time: 10.170 ms stark***=# select t1.height_cm, t1.height_in from t1; ┌───┬───┐ │ height_cm │ height_in │ ├───┼───┤ │ 2 │ 5.08 │ └───┴───┘ (1 row) Time: 1.997 ms Yours looks better :) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MAIN, Uncompressed?
On 25 August 2017 at 19:59, Simon Riggs wrote: > > On 25 August 2017 at 14:08, Tom Lane wrote: > > > Maybe, but the use case seems mighty narrow. > > JSON blobs between 2kB and 8160 bytes are very common. > > String length is maybe a poisson distribution, definitely not uniform. But JSON blobs should be highly compressible. Even jsonb will be quite compressible. That said I always found remembering the mapping from these names to various behaviours to be quite hard to use. I would have found it far more useful to have two separate properties I could set "compress" and "external" or perhaps even more useful would be to set some kind of guideline size threshold for each (and perhaps a second size compress threshold and external threshold for the whole tuple). I think this is a particularly old piece of code and we're lucky the default heuristics have served well for all this time because I doubt many people fiddle with these storage attributes. The time may have come to come up with a better UI for the storage attributes because people are doing new things (like json) and wanting more control over this heuristic. For what it's worth I think a good start would be to give people more visibility into what the tuptoaster heuristic is actually doing to their data and that will encourage people to give feedback about when they're surprised and are frustrated by the existing UI. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny WAL corruption issue
On 10 August 2017 at 15:26, Chris Travers wrote: > > > The bitwise comparison is interesting. Remember the error was: > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset > 1146880 ... > Since this didn't throw a checksum error (we have data checksums disabled but > wal records ISTR have a separate CRC check), would this perhaps indicate that > the checksum operated over incorrect data? No checksum error and this "unexpected pageaddr" doesn't necessarily mean data corruption. It could mean that when the database stopped logging it was reusing a wal file and the old wal stream had a record boundary on the same byte position. So the previous record checksum passed and the following record checksum passes but the record header is for a different wal stream position. I think you could actually hack xlogdump to ignore this condition and keep outputting and you'll see whether the records that follow appear to be old wal log data. I haven't actually tried this though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More optimization effort?
On 21 July 2017 at 20:00, Tom Lane wrote: >> I have, however, decided not to volunteer to be the one who works on >> that project. > > Me either. Any one of these things would require a *lot* of work in > order to have a coherent feature that provided useful behavior across > a bunch of different datatypes. I had in the past idly thought about whether it would be possible to link in one of the various general purpose theorem proving libraries and use it to simplify the expressions. But I really have no idea how much work it would be to teach one about all the properties and constraints of our existing data types and operators or for that matter how easy it would be to figure out what theorems we want proven to be able to use an index. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy
On 20 July 2017 at 14:19, Tom Lane wrote: > Greg Stark writes: > >> Would it be useful to keep in one of the memory checking assertion builds? > > Why? Code that expects to continue accessing a relcache entry's tupdesc > after closing the relcache entry is broken, independently of whether it's > in a debug build or not. Mea Culpa. I hadn't actually read the code and assumed it would be sensible to change from something that freed these tupdescs into something that raised an assertion if they were still unfreed at end of transaction. Reading the code I see that it's only there to *avoid* freeing the tupledesc just in case there's something still using it. If we just free it then the normal memory checking builds would catch cases where they're used after being freed. But what I still don't understand is whether the fact that it still passes the regression tests means anything. Unless there happened to be a sinval message at the wrong time the regression tests wouldn't uncover any problem cases. If it passed the tests on a CLOBBER_CACHE_ALWAYS build then perhaps that would prove it's safe? Or perhaps if the columns haven't actually been altered it would still fail to fail? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy
On 19 July 2017 at 00:26, Tom Lane wrote: > It's probably a bit late in the v10 cycle to be taking any risks in > this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX > as soon as the v11 cycle opens, unless someone can show an example > of non-broken coding that requires it. (And if so, there ought to > be a regression test incorporating that.) Would it be useful to keep in one of the memory checking assertion builds? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
On 15 July 2017 at 23:00, Tom Lane wrote: > While it's too late in the v10 cycle to do anything very meaningful > about this now, I am tempted to strengthen the deprecation notice's > wording from "might disappear" to "will disappear". "Will certainly disappear at some unspecified date" is a lot less convincing than "will disappear in 2021 in Postgres 14". The specific year and version number is irrelevant but picking and naming a specific one makes it a lot easier to follow through come that date. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] building libpq.a static library
On 12 July 2017 at 16:11, Tom Lane wrote: > Jeroen Ooms writes: > >> This works but it's a bit of a pain to maintain. I was wondering if >> this hack could be merged so that the standard 'configure >> --enable-static' script would install a static library for libpq >> alongside the shared one. > > FWIW, we used to have support for building static libpq, but > we got rid of it a long time ago. I couldn't find the exact > spot in some desultory trawling of the commit history. Fwiw I think the real problem is that building static libraries "properly" requires different compiler options -- notably they're not normally built with -fPIC. So that means building every object twice which kind of breaks make's build model which has a simple dependency graph where each object appears once. Some packages do this by inventing a foo-shared.o and foo-static.o but that introduces its own weirdness. I don't know what the downsides would be of creating a static library out of objects built with -fPIC. It might just be a small performance penalty which might be no big deal for libpq. That may be a good compromise. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On 10 July 2017 at 23:46, David Fetter wrote: > On Mon, Jul 10, 2017 at 05:33:34PM -0500, Robert Haas wrote: >> On Mon, Jul 10, 2017 at 2:15 AM, Amit Langote >> wrote: >> > I posted a patch upthread which makes \d hide partitions >> > (relispartition = true relations) and include them if the newly >> > proposed '!' modifier is specified. The '+' modifier is being >> > used to show additional detail of relations chosen to be listed at >> > all, so it seemed like a bad idea to extend its meaning to also >> > dictate whether partitions are to be listed. >> >> +1. That'd be a mess. > > With utmost respect, it's less messy than adding '!' to the already > way too random and mysterious syntax of psql's \ commands. What > should '\det!' mean? What about '\dT!'? Fwiw as, I believe, the first person to make this complaint I would be fine with + listing all partitions. Imho adding an orthogonal "!" would be too much mental overhead for the user. If you want something different perhaps we can invent ++ for "even more information" and list partitions only if two plusses are provided. (I don't think the other way around makes sense since you might need a way to list permissions and comments without listing every partition if you're on a system with an unmanageable number of partitions but you never absolutely need a way to list partitions without the comments and permissions). At least that doesn't require the user to learn a new flag and how it interacts with everything else. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Key normalization for nbtree
On 10 July 2017 at 19:40, Peter Geoghegan wrote: > Key normalization means creating a representation for internal page > items that we always just memcmp(), regardless of the details of the > underlying datatypes. One thing I would like to see is features like this added to the opclasses (or opfamilies?) using standard PG functions that return standard PG data types. So if each opclass had a function that took the data type in question and returned a bytea then you could implement that function using a language you felt like (in theory), test it using standard SQL, and possibly even find other uses for it. That kind of abstraction would be more promising for the future than having yet another C api that is used for precisely one purpose and whose definition is "provide the data needed for this usage". -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?
On 6 July 2017 at 15:29, Jim Finnerty wrote: > > Feel free to knock down this 'straw man' and propose something better! I think the pattern in this design that we don't want is that it imposes extra complexity on every user of every page even when the page doesn't have the problem and even when the problem isn't anywhere in the database. Even years from now when this problem is long gone you'll have code paths for dealing with this special page format that are rarely executed and never tested that will have to be maintained blind. Ideally a solution to this problem that imposes a cost only on the weird pages and only temporarily and leave the database in a "consistent" state that doesn't require any special processing when reading the data would be better. The "natural" solution is what was discussed for incompatible page format changes in the past where there's an point release of one Postgres version that tries to ensure there's enough space on the page for the next version and keeps track of whether there are any problematic pages. Then you would be blocked from upgrading until you had ensured all pages had space (presumably by running some special "vacuum upgrade" or something like that). Incidentally it's somewhat intriguing to think about what would happen if we *always* did such a tombstone for deletes. Or perhaps only when it's a full_page_write. Since the whole page is going into the log and that tuple will never be modified again you could imagine just replacing the tuple with the LSN of the deletion and letting anyone who really needs it fetch it from the xlog. That would be a completely different model from the way Postgres works though. More like a log-structured storage system. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for avoiding duplicate initdb runs during "make check"
On 2 July 2017 at 18:33, Tom Lane wrote: > system("cp -a ...") call in favor of something more portable. If we're ok with using Perl there's File::Copy::Recursive::dircopy() which does exactly that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Notes on testing Postgres 10b1
On 7 June 2017 at 01:01, Josh Berkus wrote: > P3: apparently jsonb_to_tsvector with lang parameter isn't immutable? > This means that it can't be used for indexing: > > libdata=# create index bookdata_fts on bookdata using gin (( > to_tsvector('english',bookdata))); > ERROR: functions in index expression must be marked IMMUTABLE I don't have a machine handy to check on but isn't this a strange thing to do? Isn't there a GIN opclass on jsonb itself which would be the default if you didn't have that to_tsvector() call -- and which would also work properly with the jsonb operators? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Functions
On 13 May 2017 at 10:29, Robert Haas wrote: > - Floats. There may be different representations in use on different > hardware, which could be a problem. Tom didn't answer my question > about whether any even-vaguely-modern hardware is still using non-IEEE > floats, which I suspect means that the answer is "no". Fwiw the answer to that is certainly no. The only caveat is that some platforms have not entirely complete implementations of IEEE missing corner cases such as denormalized values but I don't think that would be something that would be changed with a different hash function though. Personally while I would like to avoid code that actively crashes or fails basic tests on Vax even I don't think we need to worry about replication or federated queries in a heterogeneous environment where some servers are Vaxen and some are modern hardware. That seems a bit far-fetched. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A design for amcheck heapam verification
On 1 May 2017 at 20:46, Robert Haas wrote: > One problem is that Bloom filters assume you can get > n independent hash functions for a given value, which we have not got. > That problem would need to be solved somehow. If you only have one > hash function, the size of the required bloom filter probably gets > very large. There's a simple formula to calculate the optimal number of hash functions and size of the filter given a target false positive rate. But I don't think this is as big of a problem as you imagine. a) we don't really only have one hash function, we have a 32-bit hash function and we could expand that to a larger bit size if we wanted. Bloom filters are never 2^32 size bit arrays for obvious reasons. If you have a 1kbit sized bloom filter that only needs 10 bits per index so you have three fully independent hash functions available already. If we changed to a 64-bit or 128-bit hash function then you could have enough bits available to have a larger set of hash functions and a larger array. b) you can get a poor man's universal hash out of hash_any or hash_int by just tweaking the input value in a way that doesn't interact in a simple way with the hash function. Even something as simple has xoring it with a random number (i.e. a vector of random numbers that identify your randomly chosen distinct "hash functions") seems to work fine. However for future-proofing security hardening I think Postgres should really implement a real mathematically rigorous Universal Hashing scheme which provides a family of hash functions from which to pick randomly. This prevents users from being able to generate data that would intentionally perform poorly in hash data structures for example. But it also means you have a whole family of hash functions to pick for bloom filters. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication in the same cluster
On 1 May 2017 at 19:24, Andres Freund wrote: >> There is no inherent reason why the CREATE INDEX CONCURRENTLY style of >> using multiple transactions makes it necessary to leave a mess behind >> in the event of an error or hard crash. Is someone going to get around >> to fixing the problem for CREATE INDEX CONCURRENTLY (e.g., having >> extra steps to drop the useless index during recovery)? IIRC, this was >> always the plan. > > Doing catalog changes in recovery is frought with problems. Essentially > requires starting one worker per database, before allowing access. The "plan" was to add more layers PG_TRY and transactions so that if there was an error during building the index all the remnants of the failed index build got cleaned up. But when I went tried to actually do it the problem seemed to metastatize and it was going to require two or three layers of messy nested PG_TRY and extra transactions. Perhaps there's a cleaner way to structure it and I should look again. I don't recall ever having a plan to do anything in recovery. I think we did talk about why it was hard to mark hash indexes invalid during recovery which was probably the same problem. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multithreading in Batch/pipelining mode for libpq
On 21 April 2017 at 21:31, Ilya Roublev wrote: > What I need is to make a huge amount of inserts This may be a silly question but I assume you've already considered using server-side COPY? That's the most efficient way to load a lot of data currently. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Highly Variable Planning Times
On 19 April 2017 at 22:39, Michael Malis wrote: >> *At best*, you're doing substantial work in the >> planner to avoid the first tree descent step or two in a single >> non-partial index. Fwiw, in addition to replacing the first few levels of the descent with planning-time work, there's also an advantage due to the smaller keys. Effectively these partial indexes are emulating prefix-compression in the btree. > While the specific example I gave in the post could be replaced with a > non-partial index, most of the partial indexes contain predicates that > are not straightforward to index with non-partial indexes. About 85% > of the partial indexes contain a regular expression in them for CSS > selector matching. I've tried using a trigram GIN index, but it wound > up not working too well due to the data being highly redundant (almost > every CSS hierarchy contains 'div' in it). Additionally, most of the > predicates for partial indexes are extremely specific making the > partial indexes very small. The sum total size of all of the partial > indexes is still much smaller than if we were to index every necessary > field with regular indexes. I wonder if you could implement a FTS parser that tokenized html in just tokens representing the matching criteria. A GIN index using such a parser would actually be very similar to what you have as GIN indexes are basically a collection of btrees... The operational problem with that is I think it would be even harder to update a parser than adding a new partial index. I don't think you can effectively upgrade a parser to include new tokens without rebuilding any indexes using it. If you wanted to add new selector critieria live you would probably end up deploying the new parser and building a new index with CREATE INDEX CONCURRENTLY using the new parser and then dropping the old index. I'm not sure if it's possible to do a FTS parser for handling arbitrary CSS selectors but if you managed that that would be a very valuable addition to Postgres, IMHO -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some thoughts about SCRAM implementation
On 14 April 2017 at 20:20, Peter Eisentraut wrote: > > Yeah, I think if you're concerned about MITM then you would also be > concerned about MITM siphoning off your data. So you should be using > TLS and then you don't need channel binding. No. You can use TLS for authentication (by verifying SSL certs in both directions) in which case TLS will protect against MITM for you. But if you only use TLS for encryption but still want to use passwords for authentication then there's no protection against MITM as you don't know that the party doing the encryption is the same as the one you authenticated to. Channel binding is all about tying the authentication mechanism to the encryption to guarantee that the party doing the encryption is the same as the party you authenticated to. Otherwise someone could MITM the TLS connection and relay the raw bytes of of the scram negotiation. Under our md5 auth that gives them your password, under scram they won't get the password which is a huge improvement but they would still have the raw unencrypted data from your traffic. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variable substitution in psql backtick expansion
On 2 April 2017 at 07:53, Fabien COELHO wrote: > Note that this is already available indirectly, as show in the > documentation. > > SELECT some-boolean-expression AS okay \gset > \if :okay > \echo boolean expression was true > \else > \echo boolean expression was false > \endif Am I the only one who thinks that even if \if got the ability to evaluate arbitrary SQL queries I would probably still always write things as above? I think putting arbitrary SQL expressions (let alone queries) would make scripts just a total mess and impossible for humans to parse. Whereas storing the results in psql variables and then using those variables in \if makes even fairly complex queries and nested \if structures straightforward. It would also make it far clearer in what order the queries will be evaluated and under which set of conditions. I don't think taking a simple command line execution environment like psql and trying to embed a complete complex language parser in it is going to result in a sensible programming environment. Having a simple \if is already pushing it. If someone wanted anything more complex I would strongly recommend switching to perl or python before trying to code up nesting arbitrary sql in nested expressions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On 4 April 2017 at 17:10, Maksim Milyutin wrote: > > 3. As I noticed early pg_depend table is used for cascade deleting indexes > on partitioned table and its children. I also use pg_depend to determine > relationship between parent and child indexes when reindex executes > recursively on child indexes. > > Perhaps, it's not good way to use pg_depend to determine the relationship > between parent and child indexes because the kind of this relationship is > not defined. I could propose to add into pg_index table specific field of > 'oidvector' type that specify oids of dependent indexes for the current > local index. Alternately you could have an single oid in pg_index on each of the children that specifies which local index is its parent. That would probably require a new index on that column so you could look up all the children efficiently. I think it would behave more sensibly when you're adding or removing a partition, especially if you want to add many partitions in parallel using multiple transactions. An oidvector of children would effectively mean you could only be doing one partition creation or deletion at a time. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On 21 March 2017 at 20:04, Bruce Momjian wrote: > Yes, but once it is written it will take years before those bits can be > used on most installations. Well the problem isn't most installations. On most installations it should be pretty straightforward to check the oldest database xid and compare that to when the database was migrated to post-9.0. (Actually there may be some additional code to write but it's just ensuring that the bits are actually cleared and not just ignored but even so databases do generally need to be vacuumed more often than on the order of years though.) The problem is that somebody tomorrow could upgrade an 8.4 database to 10.0. In general it seems even versions we don't support get extra support for migrating away from. I assume it's better to help support upgrading than to continue to have users using unsupported versions... And even if you're not concerned about 8.4 someone could still upgrade 9.4 for years to come. It probably does make sense pick a version, say, 10.0, and have it go out of its way to ensure it cleans up the MOVED_IN/MOVED_OFF so that we can be sure that any database was pg_upgraded from 10.0+ doesn't have any left. Then at least we'll know when the bits are available again. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On 18 March 2017 at 22:22, Peter Geoghegan wrote: > > Out of idle curiosity, I decided to generate disassembly of both > macaddr_cmp_internal(), and the patch's abbreviated comparator. The > former consists of 49 x86-64 instructions at -02 on my machine, > totaling 135 bytes of object code. The latter consists of only 10 > instructions, or 24 bytes of object code. I wonder if there's something that could be optimized out of the normal cmp function but we're defeating some compiler optimizations with all our casts and aliasing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On 2 March 2017 at 13:03, amul sul wrote: > create table foo (a integer, b text) partition by hash (a); > create table foo1 partition of foo with (modulus 4, remainder 0); > create table foo2 partition of foo with (modulus 8, remainder 1); -- legal, > modulus doesn't need to match > create table foo3 partition of foo with (modulus 8, remainder 4); -- > illegal, overlaps foo1 Instead of using modulus, why not just divide up the range of hash keys using ranges? That should be just as good for a good hash function (effectively using the high bits instead of the low bits of the hash value). And it would mean you could reuse the machinery for list partitioning for partition exclusion. It also has the advantage that it's easier to see how to add more partitions. You just split all the ranges and (and migrate the data...). There's even the possibility of having uneven partitions if you have a data distribution skew -- which can happen even if you have a good hash function. In a degenerate case you could have a partition for a single hash of a particularly common value then a reasonable number of partitions for the remaining hash ranges. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On 24 February 2017 at 14:57, David G. Johnston wrote: > I dislike an error. I'd say that making partition "just work" here is > material for another patch. In this one an update of the partition key can > be documented as shorthand for delete-returning-insert with all the > limitations that go with that. If someone acceptably solves the ctid > following logic later it can be committed - I'm assuming there would be no > complaints to making things just work in a case where they only sorta > worked. Personally I don't think there's any hope that there will ever be cross-table ctids links. Maybe one day there will be a major new table storage format with very different capabilities than today but in the current architecture it seems like an impossible leap. I would expect everyone to come to terms with the basic idea that partition key updates are always going to be a corner case. The user defined the partition key and the docs should carefully explain to them the impact of that definition. As long as that explanation gives them something they can work with and manage the consequences of that's going to be fine. What I'm concerned about is that silently giving "wrong" answers in regular queries -- not even ones doing the partition key updates -- is something the user can't really manage. They have no way to rewrite the query to avoid the problem if some other user or part of their system is updating partition keys. They have no way to know the problem is even occurring. Just to spell it out -- it's not just "no-op updates" where the user sees 0 records updated. If I update all records where username='stark', perhaps to set the "user banned" flag and get back "9 records updated" and later find out that I missed a record because someone changed the department_id while my query was running -- how would I even know? How could I possibly rewrite my query to avoid that? The reason I suggested throwing a serialization failure was because I thought that would be the easiest short-cut to the problem. I had imagined having a bit pattern that indicated such a move would actually be a pretty minor change actually. I would actually consider using a normal update bitmask with InvalidBlockId in the ctid to indicate the tuple was updated and the target of the chain isn't available. That may be something we'll need in the future for other cases too. Throwing an error means the user has to retry their query but that's at least something they can do. Even if they don't do it automatically the ultimate user will probably just retry whatever operation errored out anyways. But at least their database isn't logically corrupted. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make subquery alias optional in FROM clause
On 23 February 2017 at 13:27, Greg Stark wrote: > => SELECT "?column"? FROM (select 1+1 as "?column?", 1+1) AS x; Oops, I missed the typo there: => SELECT "?column?" FROM (select 1+1 AS "?column?", 1+1) AS x; ERROR: 42702: column reference "?column?" is ambiguous LINE 2: SELECT "?column?" FROM (select 1+1 AS "?column?", 1+1) AS x; ^ LOCATION: scanRTEForColumn, parse_relation.c:669 -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make subquery alias optional in FROM clause
On 22 February 2017 at 15:08, Tom Lane wrote: > Indeed. When I wrote the comment you're referring to, quite a few years > ago now, I thought that popular demand might force us to allow omitted > aliases. But the demand never materialized. At this point it seems > clear to me that there isn't really good reason to exceed the spec here. > It just encourages people to write unportable SQL code. Oh my. This bothers me all the time. I always assumed the reason it was like this was because the grammar would be ambiguous without it and it would require extreme measures to hack the grammar to work. If it's this easy I would totally be for it. Offhand I think there are plenty of solutions for the problem of inventing names and I suspect any of them would work fine: 1) Don't assign a name -- I would guess this would require some adjustments in the rule deparsing (i.e. views). 2) Assign a name but add a flag indicating the name is autogenerated and shouldn't be used for resolving references and shouldn't be dumped. Then it shouldn't really matter if there's a conflict since the name is only used for things like error messages, not resolving references. 3) thumb through all the names in the query and pick one that doesn't conflict. For what it's worth while it wouldn't be a *bad* thing to avoid conflicts I think this is being held to an inconsistent standard here. It's not like there aren't similar situations elsewhere in the codebase where we just don't worry about this kind of thing: => SELECT "?column"? FROM (select 1+1 as "?column?", 1+1) AS x; ERROR: 42703: column "?column" does not exist LINE 2: SELECT "?column"? FROM (select 1+1 as "?column?", 1+1) AS x; ^ HINT: Perhaps you meant to reference the column "x.?column?" or the column "x.?column?". -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sum aggregate calculation for single precsion real
On 15 February 2017 at 12:52, Robert Haas wrote: > Personally, I find it somewhere in the middle: I think the way it > works now is reasonable, and I think what he wants would have been > reasonable as well. However, I find it hard to believe it would be > worth changing now on backward compatibility grounds. Honestly I was quite surprised to discover that sum(float4) worked this way because I assumed it followed the same pattern as integers. But I wonder what you mean by backward compatibility grounds. If sum(float4) returned a float8 the only compatibility issue would be someone who did "create table as" and then expected to get a float4 column and instead got a float8 column. That seems like a much more minor corner case than most any other backward incompatible change in any release. Moreover, it wouldn't be hard to make sum(float4) use a float8 as an accumulator and then cast to float4 for the final state. That would be 100% compatible with the existing behaviour aside from producing more accurate results. (Though as an aside, I think Konstantin would be much better served by using integers and storing cents or whatever unit of currency is small enough. That would actually result in accurate results which neither float4 nor float8 guarantee.) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On 13 February 2017 at 12:01, Amit Khandekar wrote: > There are a few things that can be discussed about : If you do a normal update the new tuple is linked to the old one using the ctid forming a chain of tuple versions. This tuple movement breaks that chain. So the question I had reading this proposal is what behaviour depends on ctid and how is it affected by the ctid chain being broken. I think the concurrent update case is just a symptom of this. If you try to update a row that's locked for a concurrent update you normally wait until the concurrent update finishes, then follow the ctid chain and recheck the where clause on the target of the link and if it still matches you perform the update there. At least you do that if you have isolation_level set to repeatable_read. If you have isolation level set to serializable then you just fail with a serialization failure. I think that's what you should do if you come across a row that's been updated with a broken ctid chain even in repeatable read mode. Just fail with a serialization failure and document that in partitioned tables if you perform updates that move tuples between partitions then you need to be ensure your updates are prepared for serialization failures. I think this would require another bit in the tuple info mask indicating that this is tuple is the last version before a broken ctid chain -- i.e. that it was updated by moving it to another partition. Maybe there's some combination of bits you could use though since this is only needed in a particular situation. Offhand I don't know what other behaviours are dependent on the ctid chain. I think you need to go search the docs -- and probably the code just to be sure -- for any references to ctid to ensure you catch every impact of breaking the ctid chain. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
On 11 February 2017 at 23:45, Corey Huinker wrote: > So you'd just want to know nesting depth, with no indicator of true/false? Even that's more than bash does, for example: $ if true ; then > if false ; then > : > fi > fi I'm a bit confused how the true/false is actually valuable. It doesn't tell you how the expression actually evaluated, just where you are in the code you're typing in which you can tell equally well by looking at what code you're typing in. The reason nesting level is handy is just to remind you in case you forget. For debugging scripts it would be handy to have some way to tell whether the \if expression actually evaluated to true or false but that wouldn't be in the prompt I don't think. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)
On 10 February 2017 at 21:36, Fabien COELHO wrote: >> command prompt is now >> --- --- >> \echo bob '' = initial state, no branch going on at all >> \if yes 't' = inside a true branch >> \if no'tf' = false inside a true >> \endif't' = back to just the true branch >> \if yes 'tt' >> \if yes 'ttt' >> \if yes '...ttt' = only show the last 3, but let it be known that >> there's at least one more' >> \else '...ttz' = past the point of a true bit of this branch > > > I like the "tfz" idea. I'm not sure whether the up to 6 characters is a > good, though. I haven't been following this thread but just skimming through it for the first time I thought this was more baroque than I was expecting. I was expecting something like a { for each level of nested if you're in so you can see how many deep you are. I didn't expect to see anything more complex than that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017
On 27 January 2017 at 14:52, Thomas Kellerer wrote: > > I don't have the exact syntax at hand, but it's something like this: > > create distinct type customer_id_type as integer; > create distinct type order_id_type as integer; > > create table customers (id customer_id_type primary key); > create table orders (id order_id_type primary key, customer_id > customer_id_type not null); That seems like a useful thing but it's not exactly the same use case. Measurements with units and currency amounts both have the property that you are likely to want to have a single column that uses different units for different rows. You can aggregate across them without converting as long as you have an appropriate where clause or group by clause -- GROUP BY units_of(debit_amount) for example. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017
On 24 January 2017 at 03:42, Peter van Hardenberg wrote: > The basic concept is that the value of a currency type is that it would > allow you to operate in multiple currencies without accidentally adding > them. You'd flatten them to a single type if when and how you wanted for any > given operation but could work without fear of losing information. I don't think this even needs to be tied to currencies. I've often thought this would be generally useful for any value with units. This would prevent you from accidentally adding miles to kilometers or hours to parsecs which is just as valid as preventing you from adding CAD to USD. Then you could imagine having a few entirely optional helper functions that could automatically provide conversion factors using units.dat or currency exchange rates. But even if you don't use these helper functions they would still be useful. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On 25 January 2017 at 20:06, Jim Nasby wrote: > GUCs support SET LOCAL, but that's not the same as local scoping because the > setting stays in effect unless the substrans aborts. What I'd like is the > ability to set a GUC in a plpgsql block *and have the setting revert on > block exit*. I'm wondering which GUCs you have in mind to use this with. Because what you're describing is dynamic scoping and I'm wondering if what you're really looking for is lexical scoping. That would be more in line with how PL/PgSQL variables are scoped and with how #pragmas usually work. But it would probably not be easy to reconcile with how GUCs work. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 26 January 2017 at 01:58, Thomas Munro wrote: > > I don't know how comparable it is to our checksum technology, but > MySQL seems to have some kind of checksums on table data, and you can > find public emails, blogs etc lamenting corrupted databases by > searching Google for the string "InnoDB: uncompressed page, stored > checksum in field1" (that's the start of a longer error message that > includes actual and expected checksums). I'm not sure what exactly that teaches us however. I see these were often associated with software bugs (Apparently MySQL long assumed that a checksum of 0 never happened for example). In every non software case I stumbled across seemed to be following a power failure. Apparently MySQL uses a "doublewrite buffer" to protect against torn pages but when I search for that I get tons of people inquiring how to turn it off... So even without software bugs in the checksum code I don't know that the frequency of the error necessarily teaches us anything about the frequency of hardware corruption either. And more to the point it seems what people are asking for in all those lamentations is how they can convince MySQL to continue and ignore the corruption. A typical response was "We slightly modified innochecksum and added option -f that means if the checksum of a page is wrong, rewrite it in the InnoDB page header." Which begs the question... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
I tend to agree. But in the past when this came up people pointed out you could equally do things this way and still grant all the access you wanted using SECURITY DEFINER. Arguably that's a better approach because then instead of auditing the entire monitor script you only need to audit this one wrapper function, pg_ls_monitor_dir() which just calls pg_ls_dir() on this one directory. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online enabling of page level checksums
On Jan 22, 2017 11:13 AM, "Magnus Hagander" wrote: Yes, this means the entire db will end up in the transaction log since everything is rewritten. That's not great, but for a lot of people that will be a trade they're willing to make since it's a one-time thing. Yes, this background process might take days or weeks - that's OK as long as it happens online. I'm not sure that's actually necessary. You could just log a wal record saying "checksum this block" and if it gets replayed then recalculate the checksum on that block again. This record could be exempt from the usual rules for having a fpw. There's no danger of torn pages from the checksum alone. The danger would be if some other operation does dirty that page then your need to know that the page is in this weird in between state where it's dirty but not yet had a fpw written. I'm not sure whether it's worth the infrastructure to have such a state just for this or not. On the other hand it sounds like something that would be useful.
Re: [HACKERS] RustgreSQL
On 8 January 2017 at 21:50, Jim Nasby wrote: > Somewhat related to that... it would be useful if Postgres had "fenced" > functions; functions that ran in a separate process and only talked to a > backend via a well defined API (such as libpq). There's two major advantages > that would give us: The problem with this is that any of the "interesting" extensions need to use the server API. That is, they need to be able to do things like throw errors, expand toast data, etc. IMHO just about anything you could do in an external process would be something you could much more easily and conveniently do in the client. And it would be more flexible and scalable as well as it's a lot easier to add more clients than it is to scale up the database. That said, there were several pl language implementations that worked this way. IIRC one of the Java pl languages ran in a separate Java process. I think the solution to the problem you're describing is the project formerly known as NaCl https://en.wikipedia.org/wiki/Google_Native_Client -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
On 8 January 2017 at 17:26, Greg Stark wrote: > On 5 January 2017 at 19:01, Andres Freund wrote: >> That's a bit odd - shouldn't the OS network stack take care of this in >> both cases? I mean either is too big for TCP packets (including jumbo >> frames). What type of OS and network is involved here? > > 2x may be plausible. The first 128k goes out, then the rest queues up > until the first ack comes back. Then the next 128kB goes out again > without waiting... I think this is what Nagle is supposed to actually > address but either it may be off by default these days or our usage > pattern may be defeating it in some way. Hm. That wasn't very clear. And the more I think about it, it's not right. The first block of data -- one byte in the worst case, 128kB in our case -- gets put in the output buffers and since there's nothing stopping it it immediately gets sent out. Then all the subsequent data gets put in output buffers but buffers up due to Nagle. Until there's a full packet of data buffered, the ack arrives, or the timeout expires, at which point the buffered data drains efficiently in full packets. Eventually it all drains away and the next 128kB arrives and is sent out immediately. So most packets are full size with the occasional 128kB packet thrown in whenever the buffer empties. And I think even when the 128kB packet is pending Nagle only stops small packets, not full packets, and the window should allow more than one packet of data to be pending. So, uh, forget what I said. Nagle should be our friend here. I think you should get network dumps and use xplot to understand what's really happening. c.f. https://fasterdata.es.net/assets/Uploads/20131016-TCPDumpTracePlot.pdf -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
On 5 January 2017 at 19:01, Andres Freund wrote: > That's a bit odd - shouldn't the OS network stack take care of this in > both cases? I mean either is too big for TCP packets (including jumbo > frames). What type of OS and network is involved here? 2x may be plausible. The first 128k goes out, then the rest queues up until the first ack comes back. Then the next 128kB goes out again without waiting... I think this is what Nagle is supposed to actually address but either it may be off by default these days or our usage pattern may be defeating it in some way. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_activity.waiting_start
On 6 January 2017 at 02:59, Bruce Momjian wrote: > > Agreed. No need in adding overhead for short-lived locks because the > milli-second values are going to be meaningless to users. I would be > happy if we could find some weasel value for non-heavyweight locks. For what it's worth I don't think this is true. It may be true that we don't need precise measurements of short-lived locks but we do need accurate measurements even if they're in the expected range. What users need to know is in aggregate how much of the time the database is spending working on their queries is going into different states. Even if no LWLock is held for more than a few milliseconds at a time if it turns out that 80% of the time is being spend in waiting on LWLocks then there's no point in worrying about cpu speed, for example. And knowing *which* LWLock is taking up an aggregate of 80% of time would point at either configuration changes or code changes to reduce that contention. I would actually argue the reverse of the above proposal would be more useful. What we need are counts of how often LWLocks take longer than, say, 50ms and for shorter waits we need to know how long. Perhaps not precisely for individual waits but in aggregate we need the totals to be right so as long as the measurements are accurate that would be sufficient. So an accurate but imprecise measurement +/- 10ms with low overhead is better than a precise measurement with high overhead. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
On 25 December 2016 at 09:40, Lewis, Ian (Microstar Laboratories) wrote: > So, the current behavior already breaks many tools unless one accepts > that all symbols on the server are lower case. At root, based on reading > the threads you provided, this probably indicates defects in the tools, > rather than a problem with PostgreSQL. My reading of the standard text > quoted in various places is that any mixed case identifier returned from > the catalog has to be quoted to match in a query (whether you fold to > lower or upper case). Well tools that work with user-defined columns and make assumptions that they don't require quoting are just buggy. But the problem with configurable quoting rules is a bit different. Imagine your application later decides to depend on PostGIS. So you load the PostGIS extension and perhaps also some useful functions you found on Stack Overflow for solving some GIS problem you have. Those extensions will create objects and then work with those objects and may use CamelCase for clarity -- in fact I think PostGIS functions are documented as CamelCase. The PostGIS extensions might not work on your system with different case rules if they haven't been 100% consistent with their camelCasing, and the functions from StackOverflow would be even less likely to work. If there was some way to scope this setting lexically so it only affected code that's defined in specific place that might be safer. But I don't think things are currently organized that way. If you're only concerned with server-side functions it wouldn't be hard to have a specific pl language that was case sensitive though it might be tricky to do to pl/pgsql due to the way pl/pgsql depends on the sql parser. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_activity.waiting_start
On Dec 24, 2016 5:44 PM, "Tom Lane" wrote: I think we'd need at least an order of magnitude cheaper to consider putting timing calls into spinlock or lwlock paths, and that's just not available at all, let alone portably. For spinlocks we could conceivably just bite the bullet and use a raw rdtsc or the equivalent for other platforms. It might be pretty easy to distinguish sane numbers from numbers that result after a process reschedule and we could just discard data when that happens (or count occurrences). That may possibly work for spinlocks but it won't work for anything heavier where process reschedules are routine.
Re: [HACKERS] gettimeofday is at the end of its usefulness?
On Dec 26, 2016 10:35 PM, "Tom Lane" wrote: So it seems like the configure support we'd need is to detect whether clock_gettime is available (note on Linux there's also a library requirement, -lrt), and we would also need a way to provide a platform-specific choice of clockid; we at least need enough smarts to use CLOCK_MONOTONIC_RAW on macOS. This seems like something that really should be checked at runtime. It's very specific to the specific kernel you're running on, not the build environment, and it can hopefully be measured in only a second or even a fraction of a second. The only Pebblebrook would be if other things running on the system made the test results unpredictable so that you had a small chance of getting a very suboptimal choice and we ruling the dice each time you restarted...
Re: [HACKERS] Declarative partitioning - another take
Just poking around with partitioning. I notice that "\d parent" doesn't list all the partitions, suggesting to use \d+ but a plain "\d" does indeed list the partitions. That seems a bit strange and also probably impractical if you have hundreds or thousands of partitions. Has this come up in previous discussions? Unfortunately it's proving a bit hard to search for "\d" :/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb problematic operators
On 12 December 2016 at 04:59, Craig Ringer wrote: > I didn't realise Pg's use of ? was that old, so thanks. That makes > offering alternatives much less appealing. One option might be for Postgres to define duplicate operator names using ¿ or something else. I think ¿ is a good choice because it's a common punctuation mark in spanish so it's probably not hard to find on a lot of keyboards or hard to find instructions on how to type one. There is always a risk in allowing redundant syntaxes though. For example people running grep to find all uses of an operator will miss the alternate spelling. There may even be security implications for that though to be honest that seems unlikely in this case. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Separate connection handling from backends
On 5 December 2016 at 19:48, Jim Nasby wrote: > One solution to this would be to segregate connection handling from actual > backends, somewhere along the lines of separating the main loop from the > switch() that handles libpq commands. Benefits: I'm kind of mystified how a simple code restructuring could solve the fundamental problems with a large number of backends. It sounds like what you're describing would just push the problem around, you would end up with some other maximum instead, max_backends, or max_active_backends, or something like that with the same problems. At best it would help people who have connection pooling or but few connections active at any given time. Heikki's work with CSN would actually address the main fundamental problem. Instead of having to scan PGPROC when taking a snapshot taking a snapshot would be O(1). There might need to be scans of the list of active transactions but never of all connections whether they're in a transaction or not. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mail thread references in commits
On 30 November 2016 at 16:19, Andrew Dunstan wrote: > > https://www.postgresql.org/message-id/cab7npqthydyf-fo+fzvxrhz-7_hptm4rodbcsy9-noqhvet...@mail.gmail.com > > I'll be interested to know if it breaks anyone's MUA. If it doesn't all we > will be arguing about are aesthetics, and I'm a firm believer in function > over form. I can't say I feel especially strongly either way on this but just to toss out a small thing that might make a small difference If you happen to know how your message-ids are generated then you might be able to do something useful with them. For instance, you could search all git commits for urls to messages you wrote -- for instance any commit that has CAB7nPq is referencing a message written by Michael Paquier. On the other hand you could put something naughty in the message-id forcing everyone else to use URLs with dirty words in them. Or with words like "terrorist" in them. Or with some javascript/html injection attack of some sort... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On 24 November 2016 at 23:03, Robert Haas wrote: >> For snapshot isolation Oracle has yet a *third* copy of the data in a >> space called the "rollback segment(s)". > > My understanding is that this isn't correct. I think the rollback > segments are what they call the thing that stores UNDO. See e.g. > http://ss64.com/ora/syntax-redo.html It looks like you're right. Rollback segments and Undo segments are two different pieces of code but one is just the old way and the other the new way of managing the same data. You can't have both active in the same database at the same time. I'm a bit confused because I distinctly remembered an UNDO log back in the 8i days as well but apparently that's just me imagining things. UNDO segments were introduced in 9i. This explained a bunch http://satya-dba.blogspot.ie/2009/09/undo-tablespace-undo-management.html -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNDO and in-place update
On 23 November 2016 at 04:28, Peter Geoghegan wrote: > On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas wrote: >> This basic DO-UNDO-REDO protocol has been well-understood for >> decades. > > FWIW, while this is basically true, the idea of repurposing UNDO to be > usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says > nothing about MVCC. Fwiw, Oracle does not use the undo log for snapshot fetches. It's used only for transaction rollback and recovery. For snapshot isolation Oracle has yet a *third* copy of the data in a space called the "rollback segment(s)". When you update a row in a block you save the whole block in the rollback segment. When you try to access a block you check if the CSN -- which is basically equivalent to our LSN -- is newer than your snapshot and if it is you fetch the old version of the block from the rollback. Essentially their MVCC is done on a per-block level rather than a per-row level and they keep only the newest version of the block in the table, the rest are in the rollback segment. For what it's worth I think our approach is cleaner and more flexible. They had a lot of trouble with their approach over the years and it works well only because they invested an enormous amount of development in it and also because people throw a lot of hardware at it too. I think the main use case we have trouble with is actually the "update every row in the table" type of update which requires we write to every block, plus a second copy of every block, plus write full pages of both copies, then later set hint bits dirtying pages again and generating more full pages writes, then later come along and vacuum which requires two more writes of every block, etc. If we had a solution for the special case of an update that replaces every row in a page that I think would complement HOT nicely and go a long way towards fixing our issues. Incidentally the "Interested transaction list" is for locking rows for updates and it's basically similar to what we've discussed before of having a "most frequent xmin" in the header and then a bit indicating the xmin is missing from the row header. Except in their case they don't need it for the actual xmin/xmax because their visibility is done per-block, only the transient lock state -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Do we need use more meaningful variables to replace 0 in catalog head files?
On Tue, Nov 15, 2016 at 4:50 PM, Robert Haas wrote: > On Sun, Nov 13, 2016 at 9:48 AM, Andrew Dunstan wrote: >> I'm not convinced the line prefix part is necessary, though. What I'm >> thinking of is something like this: >> >> PROCDATA( oid=1242 name=boolin isstrict=t volatile=i parallel=s nargs=1 >> rettype=bool argtypes="cstring" src=boolin ); > > I liked Tom's format a lot better. If we put this in a separate file > rather than in the header, which I favor, the PROCDATA stuff is just > noise. On the other hand, having the name as the first thing on the > line seems *excellent* for readability. Just throwing this out there It would be neat if the file format was precisely a tab or comma separated file suitable for loading into the appropriate table with COPY or loading into a spreadsheet. Then we might be able to maintain it by editing the table using SQL updates and/or other tools without having to teach them a particular input format. The trick would then be to have a preprocessing step in the build which loaded the CSV/TSV files into hash tables and replaced all the strings or other tokens with OIDs and magic values. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Physical append-only tables
On Sun, Nov 13, 2016 at 3:45 PM, Magnus Hagander wrote: > For a scenario like this, would it make sense to have an option that could > be set on an individual table making it physical append only? Basically > VACUUM would run as normal and clean up the old space when rows are deleted > back in history, but when new space is needed for a row the system would > never look at the old blocks, and only append to the end. I don't think "appending" is the right way to think about this. It happens to address the problem but only accidentally and only partially. More generally what you have is two different kinds of data with two different access patterns and storage requirements in the same table. They're logically similar but have different practical requirements. If there was some way to teach the database that your table is made of two different types of data and how to distinguish the two types then when the update occurs it could move the row to the right section of storage... This might be something the new partitioning could handle or it might need something more low-level and implicit. That said, I don't think the "maintain clustering a bit better using BRIN" is a bad idea. It's just the bit about turning a table append-only to deal with update-once data that I think is overreach. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JIT compiler for expressions
This sounds amazing. My only comment is that LLVM 3.7 is kind of old in the accelerated world of LLVM. If you have patches to LLVM you need you won't have much success submitting them as patches on 3.7. The current stable release is 3.9 and the development snapshots are 4.0. I know LLVM moves quickly and that makes it hard to try to track the development. If you worked with 4.0 you might find the apis you're using getting deprecated and rewritten several times while your project is under development.
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure wrote: > I think we can rule out faulty storage Nobody ever expects the faulty storage -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_file_settings view patch
On Wed, Oct 26, 2016 at 11:04 PM, Joshua D. Drake wrote: > On 10/26/2016 12:54 PM, Josh Berkus wrote: >> I mean, I'm not particularly in favor of using JSON for this (arrays >> seem OK), but that seems like an invalid reason not to. > > -1 to JSON for this. Sigh. Well I tried to review this patch in a previous iteration so let me give some context. The fundamental problem is that the pga_hba.conf file has some bits of complex structure that aren't easily captured by linear arrays. The problem I struggled with most was the keywords like "all", "samerole", and "replication". A simple array of text makes it awkward to distinguish those keywords from the quoted text values with the same content. And then there are the ldap options which naturally would be a data type like json or htab. Some people wanted to store strings like '"all"' with the quotes which I thought was ugly and functionally less useful because it would be hard to query and impossible to join against things like pg_users. Others wanted to give up the idea of expanding the entries at all and just have a single string for the whole line which I thought was pointless -- you may as well just read the file then. Personally my recommendation was to ignore the problem. Just have arrays of text and document that if you have a real user by the name "all" or "samerole" then the view cannot be interpreted accurately. Tools like pgadmin which want to use the view could check for such users and display a warning or error rather than inaccurate information. If there's any support for my recommendation I'm still happy to pick up the patch again and commit it. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Fri, Oct 21, 2016 at 9:03 PM, Stephen Frost wrote: > WARNING: The following essential packages will be removed. > This should NOT be done unless you know exactly what you are doing! > login > 0 upgraded, 0 newly installed, 1 to remove and 71 not upgraded. > After this operation, 1,212 kB disk space will be freed. > You are about to do something potentially harmful. > To continue type in the phrase 'Yes, do as I say!' Another case was: glxgears -iacknowledgethatthistoolisnotabenchmark Which was required to get it to print the FPS for a while. The problem -- and also the advantage -- of this is that it's scriptable. That means people can still put it in recipe books and scripts and others can copy it without being aware what it does or even that they're doing it. I think the apt-get behaviour was specifically designed to ensure it couldn't easily be put into a script which I would have said was desirable -- except I suspect there are situations where Postgres database scripts need to do a resetxlog. I'm not sure I can think of any examples offhand but I wouldn't be too surprised. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Oct 20, 2016 5:27 PM, "Noah Misch" wrote: > > On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote: > > > The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't > > have convenient access to a size argument. It could call the > > GetChunkSpace method but that will include the allocation overhead and > > That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of > VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested. Calling > GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an > assumption of mcxt.c, which is messy. Including the allocation overhead is > fine, though. I think the way out is to simply have aset.c make the memory undefined/noaccess even if it's redundant under valgrind. It's a bit unfortunate that the macros would have different semantics under the two. If it's too confusing or we're worried about the performance overhead we could make a MAKE_MEM_{UNDEFINED,NOACCESS}_IF_NO_MEMPOOL() but I don't think it's worth it myself. > > in any case isn't this memory already marked noaccess by aset.c? > > Only sometimes, when AllocSetFree() happens to call free() or wipe_mem(). I think if I did further surgery here I would rename wipe_mem and randomise_mem and make them responsible for making the memory undefined and noaccess as well. They would always be defined so that would get rid of all the ifdefs except within those functions. I have a patch working under asan on both gcc and clang. That handles noaccess but not undefined memory reads. I need to try msan to make sure the macro definitions work well for that API too. There are a couple build oddities both with gcc and clang before I can commit anything though. And I can't test valgrind to be sure the redundant calls aren't causing a problem.
Re: [HACKERS] [COMMITTERS] packing/alignment annotation for ItemPointerData redux
Sorry -- with the obvious error fixed: $ /usr/bin/clang-4.0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -Wall -c clang-bug.c clang-bug.c:55:9: warning: taking address of packed member 'ip_blkid' of class or structure 'ItemPointerData' may result in an unaligned pointer value [-Waddress-of-packed-member] return ItemPointerGetBlockNumber(&ip); ^~ clang-bug.c:49:25: note: expanded from macro 'ItemPointerGetBlockNumber' BlockIdGetBlockNumber(&(pointer)->ip_blkid) \ ~~~^~~~ clang-bug.c:39:19: note: expanded from macro 'BlockIdGetBlockNumber' (BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \ ^~~ clang-bug.c:55:9: warning: taking address of packed member 'ip_blkid' of class or structure 'ItemPointerData' may result in an unaligned pointer value [-Waddress-of-packed-member] return ItemPointerGetBlockNumber(&ip); ^~ clang-bug.c:49:25: note: expanded from macro 'ItemPointerGetBlockNumber' BlockIdGetBlockNumber(&(pointer)->ip_blkid) \ ~~~^~~~ clang-bug.c:39:55: note: expanded from macro 'BlockIdGetBlockNumber' (BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \ ^~~ 2 warnings generated. #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) #define pg_attribute_aligned(a) __attribute__((aligned(a))) #define pg_attribute_packed() __attribute__((packed)) #endif typedef unsigned BlockNumber; typedef unsigned short uint16; typedef uint16 OffsetNumber; typedef struct BlockIdData { uint16 bi_hi; uint16 bi_lo; } BlockIdData; typedef struct ItemPointerData { BlockIdData ip_blkid; OffsetNumber ip_posid; } /* If compiler understands packed and aligned pragmas, use those */ #if defined(pg_attribute_packed) && defined(pg_attribute_aligned) pg_attribute_packed() pg_attribute_aligned(2) #endif ItemPointerData; typedef ItemPointerData *ItemPointer; /* * BlockIdGetBlockNumber * Retrieve the block number from a block identifier. */ #define BlockIdGetBlockNumber(blockId) \ ( \ (BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \ ) /* * ItemPointerGetBlockNumber * Returns the block number of a disk item pointer. */ #define ItemPointerGetBlockNumber(pointer) \ ( \ BlockIdGetBlockNumber(&(pointer)->ip_blkid) \ ) int main() { ItemPointerData ip; return ItemPointerGetBlockNumber(&ip); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] packing/alignment annotation for ItemPointerData redux
Ah. Here we go: $ /usr/bin/clang-4.0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -Wall -c clang-bug.c clang-bug.c:54:9: error: use of undeclared identifier 'BlockNumber' return ItemPointerGetBlockNumber(&ip); ^ clang-bug.c:48:2: note: expanded from macro 'ItemPointerGetBlockNumber' BlockIdGetBlockNumber(&(pointer)->ip_blkid) \ ^ clang-bug.c:38:3: note: expanded from macro 'BlockIdGetBlockNumber' (BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \ ^ 1 error generated. Preprocessor output: # 1 "clang-bug.c" # 1 "" 1 # 1 "" 3 # 317 "" 3 # 1 "" 1 # 1 "" 2 # 1 "clang-bug.c" 2 typedef unsigned short uint16; typedef uint16 OffsetNumber; typedef struct BlockIdData { uint16 bi_hi; uint16 bi_lo; } BlockIdData; typedef struct ItemPointerData { BlockIdData ip_blkid; OffsetNumber ip_posid; } __attribute__((packed)) __attribute__((aligned(2))) ItemPointerData; typedef ItemPointerData *ItemPointer; # 51 "clang-bug.c" int main() { ItemPointerData ip; return ( ( (BlockNumber) (((&(&ip)->ip_blkid)->bi_hi << 16) | ((uint16) (&(&ip)->ip_blkid)->bi_lo)) ) ); } #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) #define pg_attribute_aligned(a) __attribute__((aligned(a))) #define pg_attribute_packed() __attribute__((packed)) #endif typedef unsigned short uint16; typedef uint16 OffsetNumber; typedef struct BlockIdData { uint16 bi_hi; uint16 bi_lo; } BlockIdData; typedef struct ItemPointerData { BlockIdData ip_blkid; OffsetNumber ip_posid; } /* If compiler understands packed and aligned pragmas, use those */ #if defined(pg_attribute_packed) && defined(pg_attribute_aligned) pg_attribute_packed() pg_attribute_aligned(2) #endif ItemPointerData; typedef ItemPointerData *ItemPointer; /* * BlockIdGetBlockNumber * Retrieve the block number from a block identifier. */ #define BlockIdGetBlockNumber(blockId) \ ( \ (BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \ ) /* * ItemPointerGetBlockNumber * Returns the block number of a disk item pointer. */ #define ItemPointerGetBlockNumber(pointer) \ ( \ BlockIdGetBlockNumber(&(pointer)->ip_blkid) \ ) int main() { ItemPointerData ip; return ItemPointerGetBlockNumber(&ip); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] packing/alignment annotation for ItemPointerData redux
On Wed, Oct 19, 2016 at 5:20 PM, Tom Lane wrote: > Don't know how that version number compares to "3.8.0". Argh. Just to further confuse matters apparently the warnings are from clang 4.0. I had been testing with 3.8.0 earlier but updated at some point. And I'm not being able to reproduce them with a minimal test case yet. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] packing/alignment annotation for ItemPointerData redux
[resending to -hackers sorry] Back in 2001 a hack to add __attribute__((packed)) to ItemPtr was added with a comment "Appropriate whack upside the head for ARM" (dcbbdb1b3ee). I don't know if this is still a factor in 2016 or not but it has already resulted in some collateral damage in 2015 when some compiler took that as license to align the whole struct on single byte alignment when it was buried inside another struct (d4b538ea367de). I just tried compiling with Clang 3.8.0 and got tons of warnings about this because: 'ItemPointerData' may result in an unaligned pointer value [-Waddress-of-packed-member] ...ItemPointerGetBlockNumber(&(xlrec->target_tid)), ^~~ ../../../src/include/storage/itemptr.h:69:25: note: expanded from macro 'ItemPointerGetBlockNumber' BlockIdGetBlockNumber(&(pointer)->ip_blkid) \ ~~~^~~~ ../../../src/include/storage/block.h:118:19: note: expanded from macro 'BlockIdGetBlockNumber' (BlockNumber) (((blockId)->bi_hi << 16) | ((uint16) (blockId)->bi_lo)) \ ^~~ Which seems to indicate that clang may not understand the "pg_attribute_aligned(2)" or perhaps it does and just doesn't take it into account when generating these warnings. I'm sure there are other people testing clang -- isn't it the default on MacOS? Do they not see these warnings? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Sat, Feb 6, 2016 at 4:52 AM, Noah Misch wrote: > aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit > VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit > VALGRIND_MAKE_MEM_NOACCESS(). #define those two accordingly. If ASAN has no Actually this is confusing. aset.c doesn't actually use the MEMPOOL_ALLOC macro at all, it just calls UNDEFINED, DEFINED, and NOACCESS. mcxt.c does however do the MEMPOOL_ALLOC/FREE. So both layers are calling these macros for overlapping memory areas which I find very confusing and I'm not sure what the net effect is. The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't have convenient access to a size argument. It could call the GetChunkSpace method but that will include the allocation overhead and in any case isn't this memory already marked noaccess by aset.c? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
On Mon, Oct 10, 2016 at 9:52 PM, Greg Stark wrote: > > The code is here: > > https://github.com/gsstark/retropg > > The build script is called "makeall" and it applies patches from the > "old-postgres-fixes" directory though some of the smarts are in the > script because it knows about how to run older version of the > configure script and it tries to fix up the ecpg parser duplcate > tokens separately. It saves a diff after applying the patches and > other fixups into the "net-diffs" directory but I've never checked if > those diffs would work cleanly on their own. Fwiw I was considering proposing committing some patches for these old releases to make them easier to build. I would suggest creating a tag for a for this stable legacy version and limiting the commits to just: 1) Disabling warnings 2) Fixing bugs in the configure scripts that occur on more recent systems (version number parsing etc) 3) Backporting things like the variable-length array code that prevents building 4) Adding compiler options like -fwrapv -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
On Mon, Oct 10, 2016 at 3:36 AM, Jim Nasby wrote: > FWIW, Greg Stark did a talk at PG Open about PG performance going back to at > least 7.4. He did discuss what he had to do to get those versions to compile > on modern tools, and has a set of patches that enable it. Unfortunately his > slides aren't posted[1] so I can't provide further details than that. The code is here: https://github.com/gsstark/retropg The build script is called "makeall" and it applies patches from the "old-postgres-fixes" directory though some of the smarts are in the script because it knows about how to run older version of the configure script and it tries to fix up the ecpg parser duplcate tokens separately. It saves a diff after applying the patches and other fixups into the "net-diffs" directory but I've never checked if those diffs would work cleanly on their own. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Fri, Oct 7, 2016 at 3:06 PM, Tom Lane wrote: > pg_dump alleges support for dumping from servers back to 7.0. Would v10 > be a good time to remove some of that code? It's getting harder and > harder to even compile those ancient branches, let alone get people to > test against them (cf. 4806f26f9). My initial thought is to cut support > for pre-7.3 or maybe pre-7.4 servers, as that would allow removal of > support for cases where the server lacks schemas or pg_depend, each of > which requires a fair deal of klugery in pg_dump. I might be expected to be the holdout here but it seems sensible to me. Removing code in pg_dump to deal with lacking schemas and pg_depend seems like a major simplification. > In the same line, maybe we should kill libpq's support for V2 protocol > (which would make the cutoff 7.4). And maybe the server's support too, > though that wouldn't save very much code. The argument for cutting this > isn't so much that we would remove lots of code as that we're removing > code that never gets tested, at least not by us. If it's testing we're concerned about IIRC the current servers can be arm-twisted into speaking V2 protocol. So it should be possible to test both modern servers and modern pg_dump using V2 protocol with a simple tweak. Somehow removing the whole protocol support seems a bit different to me than removing pg_dump logic. For one it's nice to be able to run a modern psql against old servers so you can run a benchmark script. For another there may be binary-only applications or drivers out there that are using V2 for whatever reason. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Macro customizable hashtable / bitmapscan & aggregation perf
On Sat, Oct 1, 2016 at 1:44 AM, Andres Freund wrote: > > Unfortunately I'm running out battery right now, so I don't want to > re-run the benchmarks posted upthread (loading takes a while). But the > last time I ran them all the results after the patches were better than > before. I have a machine sitting idle now too if you have specific ideas of what to benchmark. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Fri, Sep 30, 2016 at 2:11 AM, Robert Haas wrote: > For one thing, we can stop shipping a totally broken feature in release after > release For what it's worth I'm for any patch that can accomplish that. In retrospect I think we should have done the hash-over-btree thing ten years ago but we didn't and if Amit's patch makes hash indexes recoverable today then go for it. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of operations in SubPostmasterMain()
On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane wrote: > We could probably refactor things enough so that we do pq_init() > before PGSharedMemoryReAttach(). It would be a little bit ugly, > and it would fractionally increase the chance of a reattach failure > because pq_init() palloc's a few KB worth of buffers. I'm not quite > sure if it's worth it; thoughts? In any case the mentioned comments > are obsolete and need to be moved/rewritten. Just speaking off the cuff without reviewing the code in detail... Alternately we could call pq_init in the error path if it hasn't been called yet. I'm sure there are problems with doing that in general but for the specific errors that can happen before pq_init it might be feasible since they obviously can't have very much shared state yet to have corrupted. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Wed, Sep 28, 2016 at 7:40 AM, Piotr Stefaniak wrote: > Not remembering the context, I was initially confused about what exactly > supposedly needs to be done in order to have ASan support, especially > since I've been using it for a couple of years without any kind of > modifications. Having read the whole thread now, I assume the discussion > is now about getting MSan support, since apparently it's been already > concluded that not much is needed for getting ASan support: The differnce between msan and asan is only related to whether uninitialized reads are tracked. All other memory errors such as reading past the end of an allocation or reading after a free are tracked by both. Without asan support in Postgres's memdebug.h asan will just track whether you're using memory that is outside the memory that malloc has handed Postgres. It doesn't know anything about whether that memory has been returned by palloc or has since been pfree'd. Even the bounds checking is not great since you could be reading from palloc's header or from the bytes in the next palloc block that happened to be returned by the same malloc (or another malloc if you're unlucky). The support I added to memdebug.h called macros which call llvm intrinsics to mark the memory malloc'd by Postgres as unusuable until it's returned by palloc. Once it's returned by palloc it's marked usable except for a "guard" byte at the end. Then pfree marks the memory unusable again. This basically mimics the behaviour you would get from asan if you were using malloc directly. I added support for msan as well which is basically just one more macro to make the distinction between usable but uninitialized memory and usable and initialized memory. But I was unable to test it because msan didn't work for me at all. This seems to be the way of things with llvm. It's great stuff but there's always 10% that is broken because there's some cool new thing that's better. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund wrote: > Any plans to pick this up again? Yeah, I was just thinking I should pick this up again. > I vote for renaming the VALGRIND names etc. to something more tool-neutral. I > think it's going to be too confusing otherwise. Hm, the danger there is once I start refactoring things I could get bogged down... I would love to remove all the #ifdef's and have the macros just be no-ops if they're compiled out for example... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: > But to kick the hash AM as such to the curb is to say > "sorry, there will never be O(1) index lookups in Postgres". Well there's plenty of halfway solutions for that. We could move hash indexes to contrib or even have them in core as experimental_hash or unlogged_hash until the day they achieve their potential. We definitely shouldn't discourage people from working on hash indexes but we probably shouldn't have released ten years worth of a feature marked "please don't use this" that's guaranteed to corrupt your database and cause weird problems if you use it a any of a number of supported situations (including non-replicated system recovery that has been a bedrock feature of Postgres for over a decade). Arguably adding a hashed btree opclass and relegating the existing code to an experimental state would actually encourage development since a) Users would actually be likely to use the hashed btree opclass so any work on a real hash opclass would have a real userbase ready and waiting for delivery, b) delivering a real hash opclass wouldn't involve convincing users to unlearn a million instructions warning not to use this feature and c) The fear of breaking existing users use cases and databases would be less and pg_upgrade would be an ignorable problem at least until the day comes for the big cutover of the default to the new opclass. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST penalty functions [PoC]
On Thu, Sep 8, 2016 at 9:29 AM, Andrew Borodin wrote: >>autoconf check for IEEE 754 floats > Autoconf man says folowing: >>it is safe to assume IEEE-754 in most portable code these days > https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability Personally I wouldn't be very happy about an IEEE754 assumption. I did go to the trouble of testing Postgres on a VAX and we fixed the few instances where it had such dependencies for a net gain. If we intentionally put a dependency in in one place then we'll never be able to determine anywhere else where there are unintentional dependencies. I haven't followed the thread closely but I'm puzzled why you would need to use bit twiddling to set a floating point number. Isn't there a perfectly good way to calculate the value you want using ldexp() and other standard C library functions? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On Fri, Sep 9, 2016 at 1:01 PM, Heikki Linnakangas wrote: > I'm happy with what it looks like. We are in fact getting a more sequential > access pattern with these patches, because we're not expanding the pre-read > tuples into SortTuples. Keeping densely-packed blocks in memory, instead of > SortTuples, allows caching more data overall. Wow, this is really cool. We should do something like this for query execution too. I still didn't follow exactly why removing the prefetching allows more sequential i/o. I thought the whole point of prefetching was to reduce the random i/o from switching tapes. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs wrote: > On 6 September 2016 at 19:59, Tom Lane wrote: > >> The idea of looking to the stats to *guess* about how many tuples are >> removable doesn't seem bad at all. But imagining that that's going to be >> exact is folly of the first magnitude. > > Yes. Bear in mind I had already referred to allowing +10% to be safe, > so I think we agree that a reasonably accurate, yet imprecise > calculation is possible in most cases. That would all be well and good if it weren't trivial to do what Robert suggested. This is just a large unsorted list that we need to iterate throught. Just allocate chunks of a few megabytes and when it's full allocate a new chunk and keep going. There's no need to get tricky with estimates and resizing and whatever. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_sequence catalog
On Wed, Aug 31, 2016 at 3:01 PM, Tom Lane wrote: > > Uh, not as subtly as all that, because "select * from sequence" will > now return a different set of columns, which will flat out break a > lot of clients that use that method to get sequence properties. So? Clients expect changes like this between major releases surely. Subtle changes that cause silent breakage for end-users seems scarier than unsubtle breakage that tool authors can fix. On Wed, Aug 31, 2016 at 7:30 PM, Andres Freund wrote: > Isn't that *precisely* when it's going to matter? If you have 5 active > tables & sequences where the latter previously used independent locks, > and they'd now be contending on a single lock. If you have hundreds of > thousands of active sequences, I doubt individual page locks would > become a point of contention. I think even two sequences could be a point of contention if you, for example, are using COPY to load data into two otherwise completely independent tables in two separate processes. But that just means the row needs to be padded out to a cache line, no? Or was the concern about things like trying to pin the same page, parse the same page header, follow nearby line pointers...? I'm not sure how effective all that caching is today but it doesn't seem impossible to imagine getting that all optimized away. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Sat, Sep 3, 2016 at 12:09 PM, Simon Riggs wrote: > So doing autonomous transactions inside a single backend doesn't gain > you very much, yet it is an enormously invasive patch to do it that > way, not least because it requires you to rewrite locking and > deadlocks to make them work correctly when proc is not 1:1 with xid. > And as Serge points out it introduces other restrictions that we know > about now, perhaps more as well. Well using a separate process also requires rewriting locking and deadlock detection since a reasonable user might expect that second process to have access to data locked in their current transaction.The plus side is that we're already facing that issue with parallel query so at least it's something that only has to be solved once instead of a new problem. Parallel query is currently restricted to read-only queries however. Autonomous transactions will certainly need to be read-write so the question then is what problems led to the restriction in parallel query and are they any more tractable with autonomous transactions? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 3:11 PM, Craig Ringer wrote: > > I suspect that there'll be way too much code that relies on stashing > xact-scoped stuff in globals for that to be viable. Caches alone. > Peter will be able to explain more, I'm sure. > > We'd probably need a new transaction data object that everything > xact-scoped hangs off, so we can pass it everywhere or swap it out of > some global. The mechanical refactoring alone would be pretty scary, > not to mention the complexity of actually identifying all the less > obvious places that need changing. Well this is the converse of the same problem. Today process state and transaction are tied together. One way or another you're trying to split that -- either by having two processes share state or by having one process manage two transactions. I suppose we already have the infrastructure for parallel query so there's at least some shared problem space there. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut wrote: > - A API interface to open a "connection" to a background worker, run > queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), > AutonomousSessionExecute(), etc. The communication happens using the > client/server protocol. > I'm surprised by the background worker. I had envisioned autonomous transactions being implemented by allowing a process to reserve a second entry in PGPROC with the same pid. Or perhaps save its existing information in a separate pgproc slot (or stack of slots) and restoring it after the autonomous transaction commits. Using a background worker mean that the autonomous transaction can't access any state from the process memory. Parameters in plpgsql are a symptom of this but I suspect there will be others. What happens if a statement timeout occurs during an autonomous transaction? What happens if you use a pl language in the autonomous transaction and if it tries to use non-transactional information such as prepared statements? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pinning a buffer in TupleTableSlot is unnecessary
On Tue, Aug 30, 2016 at 11:12 AM, Heikki Linnakangas wrote: > While profiling some queries and looking at executor overhead, I realized > that we're not making much use of TupleTableSlot's ability to hold a buffer > pin. In a SeqScan, the buffer is held pinned by the underlying heap-scan > anyway. Same with an IndexScan, and the SampleScan. The only thing that > relies on that feature is TidScan, but we could easily teach TidScan to hold > the buffer pin directly. > > So, how about we remove the ability of a TupleTableSlot to hold a buffer > pin, per the attached patch? It shaves a few cycles from a ExecStoreTuple() > and ExecClearTuple(), which get called a lot. I couldn't measure any actual > difference from that, though, but it seems like a good idea from a > readability point of view, anyway. Out of curiosity why go in this direction and not the other? Why not move those other scans to use the TupleTableSlot API to manage the pins. Offhand it sounds more readable not less to have the TupleTableSlot be a self contained data structure that guarantees the lifetime of the pins matches the slots rather than have a dependency on the code structure in some far-away scan. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix some corner cases that cube_in rejects
On Mon, Aug 29, 2016 at 7:19 PM, Tom Lane wrote: > To deal with the infinity/NaN issues, I made cube_in and cube_out rely > on float8in_internal and float8out_internal, as we recently did for the > core geometric types. That causes the response to "1e-700" to be an > out-of-range error rather than silent underflow, which seems to me to > be fine, especially since it removes the platform dependency that's > responsible for needing the separate cube_1.out and cube_3.out files. So what happens to a database that has invalid cubes in it already? Offhand I suspect they mostly become valid since float8in will handle NaN and the like. Incidentally, I so wish this module were named "vector" instead of cube. I don't think I was confused by it for ages and still find it confuses me for a few moments before I remember what it does. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
On Tue, Aug 23, 2016 at 4:15 PM, Aleksander Alekseev wrote: > Frankly I have much more faith in Tom's idea of using virtual part of the > catalog for all temporary tables, i.e turning all temporary tables into > "fast" temporary tables. Instead of introducing a new type of temporary tables > that solve catalog bloating problem and forcing users to rewrite applications > why not just not to create a problem in a first place? Would applications really need to be rewritten? Are they really constructing temporary tables where the definition of the table is dynamic, not just the content? I think application authors would be pretty happy to not need to keep recreating the same tables over and over again and dealing with DDL in their run-time code. It's not really rewriting an application to just remove that DDL and move it to the one-time database schema creation. I think it's clear we got the idea of temporary tables wrong when we implemented them and the SQL standard is more useful. It's not just some implementation artifact that it's possible to implement them in an efficient way. It's a fundamental design change and experience shows that separating DDL and making it static while the DML is dynamic is just a better design. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Are these supported??
On Wed, Aug 17, 2016 at 4:22 PM, Robert Haas wrote: > We've supported having joins in a DELETE since PostgreSQL 8.1. Er, yes. Though he does say he's trying to use the same syntax as select... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Are these supported??
On Wed, Aug 17, 2016 at 4:33 AM, Vince Vielhaber wrote: > > I recently moved a mybb forum away from mysql to postgres. Along the way I > encountered a couple of things that either didn't seem to be supported or > I'm just not doing it right. > > First, the server this is on is running version 8.4.22. php is 5.6.22. 8.4 is very old. It's been unsupported for two years already. You can't have joins in a DELETE -- which table would it actually delete from? You can use a subselect to do look up information from other tables in your delete though. EXTRACT and date_part have no idea where the data they're passed came from. They can come from tables or other functions or expressions. The error you quoted is indicating that dateline is of type integer however. The syntax for EXTRACT is confusing (blame the SQL committee...) but you don't want the extra "timestamp" keyword before the column there -- in the examples that's part of the literal being used to have it be read as a timestamp. https://www.postgresql.org/docs/8.4/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we lost Uber as a user
On Wed, Aug 17, 2016 at 1:36 AM, Jim Nasby wrote: > Something I didn't see mentioned that I think is a critical point: last I > looked, HOT standby (and presumably SR) replays full page writes. That means > that *any* kind of corruption on the master is *guaranteed* to replicate to > the slave the next time that block is touched. That's completely the > opposite of trigger-based replication. Yes, this is exactly what it should be doing and exactly why it's useful. Physical replication accurately replicates the data from the master including "corruption" whereas a logical replication system will not, causing divergence and possible issues during a failover. Picture yourself as Delta, you have a fire in your data centre and go to fail over to your secondary site. Your DBAs inform you that the secondary site has "fixed" some corruption that you were unaware of and wasn't causing any issues and now, in the middle of the business crisis, is when you're going to need to spend time identifying and repairing the problem because your business logic has suddenly started running into problems. Physical replication tries to solve the same use cases as physical backups. They both provide you with exactly what you had prior to the recovery. No more or less. That's what you want when recovering from a disaster. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid
On Tue, Aug 16, 2016 at 10:15 AM, Craig Ringer wrote: > I'm surprised the 32-bit xid was ever exposed to the user, rather than a > 64-bit epoch-extended xid. Once upon a time we didn't have epoch counting at all. I don't think it would be a bad idea to clean up everything to do with xids so that everything user-facing is epoch-aware. Of course you don't always have the epoch but if we're careful about where users can see xids they should never see an xid from an old epoch. That could be a problem for internal tools like pageinspect or xlogdump but shouldn't be a problem for any real production api. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PSA: Systemd will kill PostgreSQL
On Tue, Aug 16, 2016 at 12:41 AM, Josh Berkus wrote: > Presumably people just need to add the system account tag to the unit > file, no? That's a system level change though. How would a normal user manage this? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Let's get rid of the separate minor version numbers for shlibs
On Mon, Aug 15, 2016 at 11:45 PM, Peter Eisentraut wrote: > If we instead installed > > libpq.so.5 (actual file) > libpq.so -> libpq.so.5 > > nothing would effectively change. It would make it impossible to have multiple versions installed. One doesn't normally have multiple versions with the same soname installed since the linker will only be actually using one of them anyways but it makes it a lot easier to manage transitions and debugging. I don't think it would be very standard to not have a minor version at all. Also, the above is only true for Linux. Other operating systems may set things up differently. IIRC NetBSD did do something a lot more sophisticated where it linked to the correct minor version or gave a warning and linked with a newer minor version but refused to link to older minor versions even if it was the right major version -- or something like that. But I think Tom was not proposing dropping the minor version at all, just setting the minor version equal to the release. So the our soname would be libpq.so.5.10 then libpq.so.5.11 and so on until we make an incompatible change and then it would become libpq.so.6.12 for example. I think it would be a bit confusing to see both the minor and major version bump and to see a major version start at a high number. But sonames are pretty technical internal goo and it's not too bad. I think it would be a reasonable idea. It does rule out the possibility of having a minor bump in the soname for a point-release, which as you point out wouldn't do much on Linux but on other operating systems might be a useful thing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
On Sat, Aug 13, 2016 at 1:18 AM, Andrew Gierth wrote: > > Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to > agree with this. > > (Inserting on the high leaf page is a special case, which is where the > fillfactor logic kicks in; that's why sequentially filled indexes are > (by default) 90% full rather than 100%. But other pages split into > roughly equal halves.) Hm, I was going from this lore. I didn't realize it was only for inserts near the end of the index. That's cleverer than I realized. commit 1663f3383849968415d29965ef9bfdf5aac4d358 Author: Tom Lane Date: Sat Sep 29 23:49:51 2001 + Tweak btree page split logic so that when splitting a page that is rightmost on its tree level, we split 2/3 to the left and 1/3 to the new right page, rather than the even split we use elsewhere. The idea is that when faced with a steadily increasing series of inserted keys (such as sequence or timestamp values), we'll end up with a btree that's about 2/3ds full not 1/2 full, which is much closer to the desired steady-state load for a btree. Per suggestion from Ann Harrison of IBPhoenix. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers