Re: [HACKERS] [PATCH] Support for foreign keys with arrays
On 6 April 2012 07:21, Peter Eisentraut wrote: > On lör, 2012-03-24 at 10:01 +, Gianni Ciolli wrote: >> ON (DELETE | UPDATE) actions for EACH foreign keys >> == >> >> -- --- --- >> | ON | ON | >> Action | DELETE | UPDATE | >> -- --- --- >> CASCADE | Row | Forbidden | >> SET NULL | Row | Row | >> SET DEFAULT | Row | Row | >> EACH CASCADE | Element | Element | >> EACH SET NULL | Element | Element | >> EACH SET DEFAULT | Forbidden | Forbidden | >> NO ACTION | - | - | >> RESTRICT | - | - | >> -- - - >> > I took another fresh look at this feature after not having looked for a > month or two. I think the functionality is probably OK, but I find the > interfaces somewhat poorly named. Consider, "PostgreSQL adds EACH > foreign keys" -- huh? I think they key word ELEMENT would be more > descriptive and precise, and it also leaves the door open to other kind > of non-atomic foreign key relationships outside of arrays. EACH has no > relationship with arrays. It might as well refer to each row. > > On the matter of the above chart, there has been a long back and forth > about whether the row or the element case should be the default. Both > cases are probably useful, but unfortunately you have now settled on > making maximum destruction the default. Additionally, we would now have > the case that sometimes, depending on some configuration elsewhere, an > ON DELETE CASCADE deletes more than what was actually involved in the > foreign key. What I'd suggest is to make both cases explicit. That is, > forbid ON DELETE CASCADE altogether and make people write ON DELETE > CASCADE ROW or ON DELETE CASCADE ELEMENT. In addition to making things > more explicit and safer, it would again leave the door open to other > kinds of relationships later. +1 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Fri, Jun 15, 2012 at 10:19 AM, Bruce Momjian wrote: > On Thu, Jun 14, 2012 at 02:43:04PM -0400, Tom Lane wrote: >> Euler Taveira writes: >> > On 14-06-2012 02:19, Tom Lane wrote: >> >> ... I especially think that importing bzip2 >> >> is a pretty bad idea --- it's not only a new dependency, but bzip2's >> >> compression versus speed tradeoff is entirely inappropriate for this >> >> use-case. >> >> > I see, the idea is that bzip2 would be a compiled-in option (not enabled by >> > default) just to give another compression option. >> >> I'm not particularly thrilled with "let's have more compression options >> just to have options". Each such option you add is another source of >> fail-to-connect incompatibilities (when either the client or the server >> doesn't have it). Moreover, while there are a lot of compression >> algorithms out there, a lot of them are completely unsuited for this >> use-case. If memory serves, bzip2 for example requires fairly large >> data blocks in order to get decent compression, which means you are >> either going to get terrible compression or suffer very bad latency >> when trying to apply it to a connection data stream. Agreed. I think there's probably arguments to be had for supporting compression without openssl (see below), but I don't think we need to have a whole set of potentially incompatible ways of doing it. Picking one that's good for the common case and not completely crap for the corner cases would be a better choice (meaning bzip2 is probably a very bad choice). >> So I've got very little patience with the idea of "let's put in some >> hooks and then great things will happen". It would be far better all >> around if we supported exactly one, well-chosen, method. But really >> I still don't see a reason not to let openssl do it for us. > > Do we just need to document SSL's NULL encryption option? Does the SSL NULL encryption+compression thing work if you're not using openssl? For one thing, some of us still hold a hope to support non-openssl libraries in both libpq and server side, so it's something that would need to be supported by the standard and thus available in most libraries not to invalidate that. Second, we also have things like the JDBC driver and the .Net driver that don't use libpq. the JDBC driver uses the native java ssl support, AFAIK. Does that one support the compression, and does it support controlling it? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Saving snapshots for later use
On 15.06.2012 06:19, Nikolas Everett wrote: I've been a PostgreSQL user for years and have a feature that I'd like to see implemented. I've started playing with the source and have a vague implementation plan for it, but before I go blundering around I'd like to run the idea by this list. So here it is: I'd like to be able to save the current snapshot and then at a later date roll the entire database back to that snapshot, essentially erasing everything that happened since the snapshot. Well, more like making all that stuff invisible. This would be really useful when testing applications who's databases take a while to setup and who's tests change the database in some way. It could also provide a way to recover from some horrible user error. Frankly, I'm more interested in this for testing, but having the ability to recover from errors would be nice. Have you considered using filesystem snapshots? So I'm wondering a few things: 1. I _think_ all I have to do to store one of these snapshots is to store the current xid and a list of all in progress transactions. I believe I should be able to reconstitute this with the commit log to determine visibility at that snapshot. 2. I _think_ I can save that information in a system table. It feels weird to use a table to store visibility information but part of me thinks that is ok. I can't put my finger on why I think that is ok though. 3. I'm not really sure at all what to do with other connections that are doing things during a snapshot restore. They might just have to get cut. For my purposes this would be ok but I can immagine that would be a problem. 4. I think not allowing the user to save a snapshot during a transaction would simplify the book keeping a ton. I know what I typed in (1) wouldn't cover snapshots inside transactions. I just don't think that is worth the effort. 5. On a more personal note I have no idea if I am capable of implementing this. I'm really not very good with C but I haven't had too much trouble figuring out how to add new parser rules or map them into the tcop layer. I figured out how to create a new system table without too much trouble but haven't a clue how upgrades work for the new table. I'm not sure how to read from the system table but I see there is a faq entry for that. 6. I think it'd be ok if restoring an older snapshot removes a newer snapshot. Using the snapshot mechanism to jump forward in time just sounds hard to keep strait. To revert the database to the earlier state, you'll also need to somehow roll back all the already-committed transactions. At first sight, that seems easy - just modify clog to mark them as aborted. However, it's not that easy, because you'd also need to somehow clear hint bits that claim those transactions to be committed. Or prevent those hint bits from being set in the first place, but that affects performance, so I don't think that would be very useful for testing. All in all, I think a filesystem snapshot is simpler. Or even just use a copy of the database, and use rsync to revert it back to the original state. You will have to restart postgres, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Saving snapshots for later use
I've been a PostgreSQL user for years and have a feature that I'd like to see implemented. I've started playing with the source and have a vague implementation plan for it, but before I go blundering around I'd like to run the idea by this list. So here it is: I'd like to be able to save the current snapshot and then at a later date roll the entire database back to that snapshot, essentially erasing everything that happened since the snapshot. Well, more like making all that stuff invisible. This would be really useful when testing applications who's databases take a while to setup and who's tests change the database in some way. It could also provide a way to recover from some horrible user error. Frankly, I'm more interested in this for testing, but having the ability to recover from errors would be nice. Once you've saved the snapshot it'd be tempting to expose the ability to run queries against the old snapshot as well. This seems like more of a bonus in my mind, though. >From here on out I'm not sure if I have my terms right so please correct me if I'm wrong. So I'm wondering a few things: 1. I _think_ all I have to do to store one of these snapshots is to store the current xid and a list of all in progress transactions. I believe I should be able to reconstitute this with the commit log to determine visibility at that snapshot. 2. I _think_ I can save that information in a system table. It feels weird to use a table to store visibility information but part of me thinks that is ok. I can't put my finger on why I think that is ok though. 3. I'm not really sure at all what to do with other connections that are doing things during a snapshot restore. They might just have to get cut. For my purposes this would be ok but I can immagine that would be a problem. 4. I think not allowing the user to save a snapshot during a transaction would simplify the book keeping a ton. I know what I typed in (1) wouldn't cover snapshots inside transactions. I just don't think that is worth the effort. 5. On a more personal note I have no idea if I am capable of implementing this. I'm really not very good with C but I haven't had too much trouble figuring out how to add new parser rules or map them into the tcop layer. I figured out how to create a new system table without too much trouble but haven't a clue how upgrades work for the new table. I'm not sure how to read from the system table but I see there is a faq entry for that. 6. I think it'd be ok if restoring an older snapshot removes a newer snapshot. Using the snapshot mechanism to jump forward in time just sounds hard to keep strait. I'd attach a patch of my work so far but it certainly isn't anything mind blowing at this point. Sorry to ramble on for a solid page. Thanks for everyone who made it to the end. Cheers, Nik
Re: [HACKERS] sortsupport for text
On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan wrote: >> Here we know that it doesn't matter, so the application of Knuth's first law >> of optimization is appropriate. > > I'm not advocating some Byzantine optimisation, or even something that > could reasonably be described as an optimisation at all here. I'm > questioning why you've unnecessarily complicated the code by having > the buffer size just big enough to fit the biggest value seen so far, > but arbitrarily aligned to a value that is completely irrelevant to > bttextfastcmp_locale(), rather than using simple geometric expansion, > which is more or less the standard way of managing the growth of a > dynamic array. Well, so... palloc, and most malloc implementations, don't actually just double every time forever. They double up to some relatively small number like 1MB, and then they expand in 1MB chunks. std::vector may well behave as you're describing, but that's doing something completely different. We're not accumulating a string of unknown length into a buffer, with the risk of having to copy the whole buffer every time we reallocate. We know the exact size of the buffer we need for any given string, so the cost of a pfree/palloc is only the cost of releasing and allocating memory; there is no additional copying cost, as there would be for std::vector. Look at it another way. Right now, the worst case number of temporary buffer allocations is about 2N lg N, assuming we do about N lg N comparisons during a sort. If we simply implemented the most naive buffer reallocation algorithm possible, we might in the worst case reallocate each buffer up to N times. That is already better than what we do now by a factor of lg N. If we suppose N is about a million, that's an improvement of 20x *in the worst case* - typically, the improvement will be much larger, because all the strings will be of similar length or we won't hit them in exactly increasing order of length. The algorithm I've actually implemented bounds the worst case 1024 times more tightly - given N strings, we can't need to enlarge the buffer more than N/1024 times. So with N of around a million, this algorithm should eliminate *at least* 99.995% of the pallocs we currently do . How much better does it need to be? > You have to grow the array in some way. The basic approach I've > outlined has something to recommend it - why does it make sense to > align the size of the buffer to TEXTBUFLEN in particular though? There's nothing magic about TEXTBUFLEN as far as that goes. We could round to the nearest multiple of 8kB or whatever if that seemed better. But if we rounded to, say, the nearest 1MB, then someone sorting strings that are 2kB long would use 2MB of memory for these buffers. Considering that we ship with work_mem = 1MB, that could easily end up wasting almost twice work_mem. I don't see how you can view that as a trivial problem. It might be worth sucking that up if it seemed likely to dramatically improve performance, but it doesn't. > It's > quite easy to imagine what you've done here resulting in an excessive > number of allocations (and pfree()s), which *could* be expensive. Actually, I can't imagine that at all, per the above analysis. If I could, I might agree with you. :-) > There is a trade-off between space and time to be made here, but I > don't know why you think that the right choice is to use almost the > smallest possible amount of memory in all cases. Because I think that with the current implementation I can have my cake and eat it, too. I believe I've gotten essentially all the available speedup for essentially no memory wastage. If you can convince me that I've missed something and there is still a meaningful amount of palloc overhead left to be squeezed out, I'm all ears. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Thu, Jun 14, 2012 at 02:43:04PM -0400, Tom Lane wrote: > Euler Taveira writes: > > On 14-06-2012 02:19, Tom Lane wrote: > >> ... I especially think that importing bzip2 > >> is a pretty bad idea --- it's not only a new dependency, but bzip2's > >> compression versus speed tradeoff is entirely inappropriate for this > >> use-case. > > > I see, the idea is that bzip2 would be a compiled-in option (not enabled by > > default) just to give another compression option. > > I'm not particularly thrilled with "let's have more compression options > just to have options". Each such option you add is another source of > fail-to-connect incompatibilities (when either the client or the server > doesn't have it). Moreover, while there are a lot of compression > algorithms out there, a lot of them are completely unsuited for this > use-case. If memory serves, bzip2 for example requires fairly large > data blocks in order to get decent compression, which means you are > either going to get terrible compression or suffer very bad latency > when trying to apply it to a connection data stream. > > So I've got very little patience with the idea of "let's put in some > hooks and then great things will happen". It would be far better all > around if we supported exactly one, well-chosen, method. But really > I still don't see a reason not to let openssl do it for us. Do we just need to document SSL's NULL encryption option? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
> I guess my first question is: why do we need this? There are lots of > things in the TODO list that someone wanted once upon a time, but > they're not all actually important. Do you have reason to believe > that this one is? It's been six years since that email, so it's worth > asking if this is actually relevant. As far as I know the pg_control is not WAL protected, which means if it gets corrupt due to any reason (disk crash during flush, so written partially), it might lead to failure in recovery of database. So user can use pg_resetxlog to recover the database. Currently pg_resetxlog works on guessed values for pg_control. However this implementation can improve the logic that instead of guessing, it can try to regenerate the values from WAL. This implementation can allow better recovery in certain circumstances. > The deadline for patches for this CommitFest is today, so I think you > should target any work you're starting now for the NEXT CommitFest. Oh, I am sorry, as this was my first time I was not fully aware of the deadline. However I still seek your opinion whether it makes sense to work on this feature. -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Friday, June 15, 2012 12:40 AM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila wrote: > I am planning to work on the below Todo list item for this CommitFest > Allow WAL information to recover corrupted pg_controldata > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php The deadline for patches for this CommitFest is today, so I think you should target any work you're starting now for the NEXT CommitFest. > I wanted to confirm my understanding about the work involved for this patch: > The existing patch has following set of problems: > 1. Memory leak and linked list code path is not proper > 2. lock check for if the server is already running, is removed in patch > which needs to be reverted > 3. Refactoring of the code. > > Apart from above what I understood from the patch is that its intention is > to generate values for ControlFile using WAL logs when -r option is used. > > The change in algorithm from current will be if control file is corrupt > which essentialy means ReadControlFile() will return False, then it should > generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using > WAL if -r option is enabled. > > Also for -r option, it doesn't need to call function FindEndOfXLOG() as the > that work will be achieved by above point. > > It will just rewrite the control file and dont do other resets. > > > The algorithm of restoring the pg_control value from old xlog file: > 1. Retrieve all of the active xlog files from xlog direcotry into a list > by increasing order, according their timeline, log id, segment id. > 2. Search the list to find the oldest xlog file of the lastest time line. > 3. Search the records from the oldest xlog file of latest time line to > the latest xlog file of latest time line, if the checkpoint record > has been found, update the latest checkpoint and previous checkpoint. > > > > Apart from above some changes in code will be required after the Xlog patch > by Heikki. > > Suggest me if my understanding is correct? I guess my first question is: why do we need this? There are lots of things in the TODO list that someone wanted once upon a time, but they're not all actually important. Do you have reason to believe that this one is? It's been six years since that email, so it's worth asking if this is actually relevant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: relation metapages
On Thu, Jun 14, 2012 at 5:34 PM, Heikki Linnakangas wrote: > On 14.06.2012 18:01, Robert Haas wrote: >> What I'm really looking for at this stage of the game is feedback on >> the design decisions I made. The intention here is that it should be >> possible to read old-format heaps and indexes transparently, but that >> when we create or rewrite a relation, we add a new-style metapage. > > That dodges the problem of pg_upgrade, but eventually, we will want to use > the metapage for something important, and it can't be optional at that point > anymore. So we will eventually need to deal with pg_upgrade somehow. Well, the code as I've written deals with pg_upgrade just fine: you can move your old relation files over and they still work. What's missing at present is an efficient way to convert them to the new format. If you don't mind the inefficiency, you can use VACUUM FULL or CLUSTER, and you're there, but obviously we'll want something better before we start relying on this for anything too critical. For indexes, it should be pretty trivial to reduce the requirement from "rewrite while holding AccessExclusiveLock" to "brief AccessExclusiveLock". Everything except GiST already has a metapage, so you just need to rewrite that page in the new format, which is a SMOP. GiST requires moving the existing metapage out to a free page (or a new page) and writing a metapage pointing to it into block 0, which is again pretty simple. Heaps are a bit harder. We could adopt your idea of storing a block number in the metablock; any index TIDs that point to block 0 will be interpreted as pointing to that block instead. Then we can basically just relocate block to any free block, or a new one, as with GiST. Alternatively, we can read the tuples in block 0 and reinsert them; vacuum; and then repurpose block 0. Taking it even further, we could try to do better than "brief AccessExclusiveLock". That might be possible too, especially for indexes, but I'm not sure that it's necessary, and I haven't thought through the details. Even if we just get it down to "brief AccessExclusiveLock", I think we might also be able to improve the experience by making autovacuum do conversions automatically. So, if you pg_upgrade, the first autovacuum worker that spins through the database will ConditionalLockAcquire an AccessExclusiveLock on each relation in turn and try to do the conversion. If it can't get the lock it'll keep retrying until it succeeds. Odds are good that for most people this would make the addition of the metapage completely transparent. On the other hand, if metapages exist mostly to support operations like making an unlogged table logged or visca versa, that's not really necessary: we can add the metapage when someone performs a DDL operation that requires it. There is a lot of optimization possible on top of the basic mechanism, but how much of it makes sense to do, and which parts, depends on exactly which of the many things we could do with this we end up deciding to actually do. I'm trying to start with the basics, which means getting the basic infrastructure in place and working. > It would be nice to have the oid of the access method in the metapage (or > some other way to identify it). Yeah, I was thinking about that. I think a magic number might be preferable to an OID, and we actually already have that as the first 4-bytes of the access method metadata for all index types except GIN. I'm thinking about trying to fix up GIN so that it adds one as well; the trick is to do it in a way that is backward-compatible, which I have an idea how to do but haven't tackled yet. We can add a magic number for heaps as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] transforms
Here is my first patch for the transforms feature. This is a mechanism to adapt data types to procedural languages. The previous proposal was here: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00728.php At the surface, this contains: - New catalog pg_transform - CREATE/DROP TRANSFORM As proof of concepts and hopefully useful applications to-be, I have included transforms for - PL/Perl - hstore - PL/Python - hstore - PL/Python - ltree These, along with the documentation for the above-mentioned DDL commands and catalog can serve as explanations for reviewers. There are numerous remaining discussion points and possible work items: *) Currently, each of the above provided transforms is put into a separate subdirectory under contrib. This might be OK, but it is mainly because we cannot build more than one MODULE_big per directory. Fixing that might be more work than it's worth, but if we want to provide more of these in contrib, it might get crowded. *) Since we have separate extensions for plperl and plperlu, and also for plpython2u and plpython3u, we need one extension for adapting each of these to a given type. You can see under contrib/hstore_plperl what this looks like. Maybe this isn't fixable or worth fixing. (With the directory and packaging not finalized, I haven't included any documentation for these contrib modules yet.) *) No psql backslash commands yet. Could be added. *) Permissions: Transforms don't have owners, a bit like casts. Currently, you are allowed to drop a transform if you own both the type and the language. That might be too strict, maybe own the type and have privileges on the language would be enough. *) If we want to offer the ability to write transforms to end users, then we need to install all the header files for the languages and the types. This actually worked quite well; including hstore.h and plperl.h for example, gave you what you needed. In other cases, some headers might need cleaning up. Also, some magic from the plperl and plpython build systems might need to be exposed, for example to find the header files. See existing modules for how this currently looks. *) There is currently some syntax schizophrenia. The grammar accepts CREATE TRANSFORM FOR hstore LANGUAGE plperl ( FROM SQL WITH hstore_to_plperl, TO SQL WITH plperl_to_hstore ); but pg_dump produces CREATE TRANSFORM FOR hstore LANGUAGE plperl ( FROM SQL WITH hstore_to_plperl(hstore), TO SQL WITH plperl_to_hstore(internal) ); The SQL standard allows both. (In the same way that it allows 'DROP FUNCTION foo' without arguments, if it is not ambigious.) Precedent is that CREATE CAST requires arguments, but CREATE LANGUAGE does not. *) The issue of how to handle arguments of type "internal" was already discussed at http://archives.postgresql.org/pgsql-hackers/2012-05/msg00857.php and following. I have adopted the suggestion there to prohibit calling functions with arguments of type "internal", but that is probably full of holes and needs to be reviewed carefully. *) I'm not very familiar with the Perl C API, so the hstore_plperl implementation is probably full of memory leaks and weirdnesses. It needs some help from a PL/Perl expert. (The PL/Python stuff is hopefully better.) *) ltree_plpython lacks the transform function from python to ltree, because of laziness and lack of clear documentation on how to construct ltree values. *) I just noticed that the grammar changes broke ECPG. I'll look into that. *) The regression tests for the core CREATE/DROP TRANSFORM functionality is in contrib/hstore_plpython, because the core has no suitable language available. It's a bit ugly, but I don't know of a better way short of implementing a fake language for regression testing only. transforms-20120615.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 14 June 2012 20:32, Robert Haas wrote: > Yeah, but *it doesn't matter*. If you test this on strings that are > long enough that they get pushed out to TOAST, you'll find that it > doesn't measurably improve performance, because the overhead of > detoasting so completely dominates any savings on the palloc side that > you can't pick them out of the inter-run noise. That's probably true, but it's also beside the point. As recently as a few hours ago, you yourself said "my guess is that most values people sort by are pretty short, making this concern mostly academic". Why are you getting hung up on toasting now? > Here we know that it doesn't matter, so the application of Knuth's first law > of optimization is appropriate. I'm not advocating some Byzantine optimisation, or even something that could reasonably be described as an optimisation at all here. I'm questioning why you've unnecessarily complicated the code by having the buffer size just big enough to fit the biggest value seen so far, but arbitrarily aligned to a value that is completely irrelevant to bttextfastcmp_locale(), rather than using simple geometric expansion, which is more or less the standard way of managing the growth of a dynamic array. You have to grow the array in some way. The basic approach I've outlined has something to recommend it - why does it make sense to align the size of the buffer to TEXTBUFLEN in particular though? It's quite easy to imagine what you've done here resulting in an excessive number of allocations (and pfree()s), which *could* be expensive. If you're so conservative about allocating memory, don't grow the array at quite so aggressive a rate as doubling it each time. There is a trade-off between space and time to be made here, but I don't know why you think that the right choice is to use almost the smallest possible amount of memory in all cases. >> Another concern is that it seems fairly pointless to have two buffers. >> Wouldn't it be more sensible to have a single buffer that was >> partitioned to make two logical, equally-sized buffers, given that in >> general each buffer is expected to grow at exactly the same rate? > > Sure, but it would be making the code more complicated in return for > no measurable performance benefit. We generally avoid that. Fair enough. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote: > As I threatened earlier > (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co > m), here are three patches that change the WAL format. The goal is to > change the format so that when you're inserting a WAL record of a given > size, you know exactly how much space it requires in the WAL. > > 1. Use a 64-bit segment number, instead of the log/seg combination. And > don't waste the last segment on each logical 4 GB log file. The concept > of a "logical log file" is now completely gone. XLogRecPtr is unchanged, > but it should now be understood as a plain 64-bit value, just split into > two 32-bit integers for historical reasons. On disk, this means that > there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. API compatibility isn't a great argument either as code manually playing around with those needs to be checked anyway. I think there might be some code around that does XLogRecPtr addition manuall and such. > 2. Always include the xl_rem_len field, used for continuation records, > in the xlog page header. A continuation log record only contained that > one field, it's now included straight in the page header, so the concept > of a continuation record doesn't exist anymore. Because of alignment, > this wastes 4 bytes on every page that contains continued data from a > previous record, and 8 bytes on pages that don't. That's not very much, > and the next step will buy that back: > > 3. Allow WAL record header to be split across pages. Per Tom's > suggestion, move xl_tot_len to be the first field in XLogRecord, so that > even if the header is split, xl_tot_len is always on the first page. > xl_crc is moved to be the last field, and xl_prev is the second to last. > This has the advantage that you can calculate the CRC for all the other > fields before acquiring WALInsertLock. For xl_prev, you need to know > where exactly the record is inserted, so it's handy that it's the last > field before CRC. This patch doesn't try to take advantage of that, > however, and I'm not sure if that makes any difference once I finish the > patch to make XLogInsert scale better, which is the ultimate goal of all > this. > > Those are the three patches I'd like to get committed in this > commitfest. To see where all this is leading to, I've included a rough > WIP version of the XLogInsert scaling patch. This version is quite > different from the one I posted in spring, it takes advantage of the WAL > format changes, and I'm also experimenting with a different method of > tracking how far each WAL insertion has progressed. But more on that later. > > (Note to self: remember to bump XLOG_PAGE_MAGIC) Will review. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote: > As I threatened earlier > (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co > m), here are three patches that change the WAL format. The goal is to > change the format so that when you're inserting a WAL record of a given > size, you know exactly how much space it requires in the WAL. I fear the patches need rebasing after the pgindent run... Even before that (60801944fa105252b48ea5688d47dfc05c695042) it only applies with offsets? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote: > On 13.06.2012 14:28, Andres Freund wrote: > > Features: > > - streaming reading/writing > > - filtering > > - reassembly of records > > > > Reusing the ReadRecord infrastructure in situations where the code that > > wants to do so is not tightly integrated into xlog.c is rather hard and > > would require changes to rather integral parts of the recovery code > > which doesn't seem to be a good idea. > > It would be nice refactor ReadRecord and its subroutines out of xlog.c. > That file has grown over the years to be really huge, and separating the > code to read WAL sounds like it should be a pretty natural split. I > don't want to duplicate all the WAL reading code, so we really should > find a way to reuse that. I'd suggest rewriting ReadRecord into a thin > wrapper that just calls the new xlogreader code. > > > Missing: > > - "compressing" the stream when removing uninteresting records > > - writing out correct CRCs > > - validating CRCs > > - separating reader/writer > > - comments. > > At a quick glance, I couldn't figure out how this works. There seems to > be some callback functions? If you want to read an xlog stream using > this facility, what do you do? Can this be used for writing WAL, as well > as reading? If so, what do you need the write support for? Oh, btw, the callbacks and parameters are somewhat documented in the xlogreader.h header in the XLogReaderState struct. Still needs improvement though. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] measuring spinning
On Wed, Jan 11, 2012 at 8:48 PM, Robert Haas wrote: > I've had cause, a few times this development cycle, to want to measure > the amount of spinning on each lwlock in the system. To that end, > I've found the attached patch useful. Note that if you don't define > LWLOCK_STATS, this changes nothing except that the return value from > s_lock becomes int rather than void. If you do define LWLOCK_STATS, > then LWLockAcquire() counts the number of pg_usleep() calls that are > required to acquire each LWLock, in addition to the other statistics. > Since this has come up for me a few times now, I'd like to proposing > including it in core. Well, this fell through the cracks, because I forgot to add it to the January CommitFest. Here it is again, rebased. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company spindebug-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote: > On 13.06.2012 14:28, Andres Freund wrote: > > Features: > > - streaming reading/writing > > - filtering > > - reassembly of records > > > > Reusing the ReadRecord infrastructure in situations where the code that > > wants to do so is not tightly integrated into xlog.c is rather hard and > > would require changes to rather integral parts of the recovery code > > which doesn't seem to be a good idea. > It would be nice refactor ReadRecord and its subroutines out of xlog.c. > That file has grown over the years to be really huge, and separating the > code to read WAL sounds like it should be a pretty natural split. I > don't want to duplicate all the WAL reading code, so we really should > find a way to reuse that. I'd suggest rewriting ReadRecord into a thin > wrapper that just calls the new xlogreader code. I aggree that it is not very nice to duplicate it. But I also don't want to go the route of replacing ReadRecord with it for a while, we can replace ReadRecord later if we want. As long as it is in flux like it is right now I don't really see the point in investing energy in it. Also I am not that sure how a callback oriented API fits into the xlog.c workflow? > > Missing: > > - "compressing" the stream when removing uninteresting records > > - writing out correct CRCs > > - validating CRCs > > - separating reader/writer > > - comments. > At a quick glance, I couldn't figure out how this works. There seems to > be some callback functions? If you want to read an xlog stream using > this facility, what do you do? You currently have to fill out 4 callbacks: XLogReaderStateInterestingCB is_record_interesting; XLogReaderStateWriteoutCB writeout_data; XLogReaderStateFinishedRecordCB finished_record; XLogReaderStateReadPageCB read_page; As an example how to use it (from the walsender support for START_LOGICAL_REPLICATION): if(!xlogreader_state){ xlogreader_state = XLogReaderAllocate(); xlogreader_state->is_record_interesting = RecordRelevantForLogicalReplication; xlogreader_state->finished_record = ProcessRecord; xlogreader_state->writeout_data = WriteoutData; xlogreader_state->read_page = XLogReadPage; /* startptr is the current XLog position */ xlogreader_state->startptr = startptr; XLogReaderReset(xlogreader_state); } /* how far does valid data go */ xlogreader_state->endptr = endptr; XLogReaderRead(xlogreader_state); The last step will then call the above callbacks till it reaches endptr. I.e. it first reads a page with "read_page"; then checks whether a record is interesting for the use-case ("is_record_interesting"); in case it is interesting, it gets reassembled and passed to the "finished_record" callback. Then the bytestream gets written out again with "writeout_data". In this case it gets written to the buffer the walsender has allocated. In others it might just get thrown away. > Can this be used for writing WAL, as well as reading? If so, what do you > need the write support for? It currently can replace records which are not interesting (e.g. index changes in the case of logical rep). Filtered records are replaced with XLOG_NOOP records with correct length currently. In future the actual amount of data should really be reduced. I don't know yet know how to map LSNs of uncompressed/compressed stream onto each other... The filtered data is then passed to a writeout callback (in a streaming fashion). The whole writing out part is pretty ugly at the moment and I just bolted it ontop because it was convenient for the moment. I am not yet sure how the api for that should look Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: relation metapages
On 14.06.2012 18:01, Robert Haas wrote: What I'm really looking for at this stage of the game is feedback on the design decisions I made. The intention here is that it should be possible to read old-format heaps and indexes transparently, but that when we create or rewrite a relation, we add a new-style metapage. That dodges the problem of pg_upgrade, but eventually, we will want to use the metapage for something important, and it can't be optional at that point anymore. So we will eventually need to deal with pg_upgrade somehow. I put the new metapage code in src/backend/access/common/metapage.c, but I don't have a lot of confidence that that's the appropriate location for it. Suggestions are appreciated. That seems good to me. It would be nice to have the oid of the access method in the metapage (or some other way to identify it). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
On 13.06.2012 14:28, Andres Freund wrote: Features: - streaming reading/writing - filtering - reassembly of records Reusing the ReadRecord infrastructure in situations where the code that wants to do so is not tightly integrated into xlog.c is rather hard and would require changes to rather integral parts of the recovery code which doesn't seem to be a good idea. It would be nice refactor ReadRecord and its subroutines out of xlog.c. That file has grown over the years to be really huge, and separating the code to read WAL sounds like it should be a pretty natural split. I don't want to duplicate all the WAL reading code, so we really should find a way to reuse that. I'd suggest rewriting ReadRecord into a thin wrapper that just calls the new xlogreader code. Missing: - "compressing" the stream when removing uninteresting records - writing out correct CRCs - validating CRCs - separating reader/writer - comments. At a quick glance, I couldn't figure out how this works. There seems to be some callback functions? If you want to read an xlog stream using this facility, what do you do? Can this be used for writing WAL, as well as reading? If so, what do you need the write support for? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
On Thu, Jun 14, 2012 at 5:00 PM, Robert Haas wrote: > On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund wrote: >> Lets sidetrack this till we have a tender agreement on how to handle DDL ;). >> I >> am aware of the issues with rollbacks, truncate et al... > > Agreed; I will write up my thoughts about DDL on the other thread. I > think that's a key thing we need to figure out; once we understand how > we're handling that, the correct design for this will probably fall > out pretty naturally. I wonder if *an* answer (if not forcibly a perfect one) is to provide a capturable injection point. A possible shape of this would be to have a function to which you pass a DDL statement, at which point, it does two things: a) Runs the DDL, to make the requested change, and b) Captures the DDL in a convenient form in the WAL log so that it may be detected and replayed at the right point in processing. That's not a solution for capturing DDL automatically, but it's something that would be useful-ish even today, for systems like Slony and Londiste, and this would be natural to connect to Dimitri's "Event Triggers." That also fits with the desire to have components that are (at least hopefully) usable to other existing replication systems. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund wrote: > Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I > am aware of the issues with rollbacks, truncate et al... Agreed; I will write up my thoughts about DDL on the other thread. I think that's a key thing we need to figure out; once we understand how we're handling that, the correct design for this will probably fall out pretty naturally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
On Thursday, June 14, 2012 08:50:51 PM Robert Haas wrote: > On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund wrote: > > This patch is problematic because formally indexes used by syscaches > > needs to be unique, this one is not though because of 0/InvalidOids > > entries for nailed/shared catalog entries. Those values aren't allowed > > to be queried though. > That's not the only reason it's not unique. Take a look at > GetRelFileNode(). We really only guarantee that tablespace OID, relfilenode, backend-ID>, taken as a four-tuple, is > unique. You could have the same relfilenode in different tablespaces, > or even within the same tablespace with different backend-IDs. The > latter might not matter for you because you're presumably disregarding > temp tables, but the former probably does. It's an uncommon scenario > because we normally set relid = relfilenode, and of course relid is > unique across the database, but the table gets rewritten then you end > up with relid != relfilenode, and I don't think there's anything at > that point that will prevent the new relfilenode from being chosen as > some other relations relfilenode, as long as it's in a different > tablespace. > > I think the solution may be to create a specialized cache for this, > rather than relying on the general syscache infrastructure. You might > look at, e.g., attoptcache.c for an example. That would allow you to > build a cache that is aware of things like the relmapper > infrastructure, and the fact that temp tables are ignorable for your > purposes. But I think you will need to include at least the > tablespace OID in the key along with the relfilenode to make it > bullet-proof. Yes, the tablespace OID should definitely be in there. Need to read up on the details of an own cache. Once more I didn't want to put in more work before discussing it here. > I haven't read through the patch series far enough to know what this > is being used for yet, but my fear is that you're using it to handle > mapping a relflenode extracted from the WAL stream back to a relation > OID. The problem with that is that relfilenode assignments obey > transaction semantics. So, if someone begins a transaction, truncates > a table, inserts a tuple, and commits, the heap_insert record is going > to refer to a relfilenode that, according to the system catalogs, > doesn't exist. This is similar to one of the worries in my other > email, so I won't belabor the point too much more here... Well, yes. We *need* to do the mapping back from the relfilenode to a table. The idea is that by the fact that the receiving side, be it a full cluster or just a catalog one, has a fully synchronized catalog in which DDL gets applied correctly, inside the transaction just as in the sending side, it should never be wrong to do that mapping. It probably is necessary to make the syscache lookup/infrastructure to use an MVCCish snapshot though. No idea how hard that would be yet. Might be a good argument for your argument of using a specialized cache. Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I am aware of the issues with rollbacks, truncate et al... Thanks, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
Hi Robert, Thanks for your answer. On Thursday, June 14, 2012 06:17:26 PM Robert Haas wrote: > On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund wrote: > > === Design goals for logical replication === : > > - in core > > - fast > > - async > > - robust > > - multi-master > > - modular > > - as unintrusive as possible implementation wise > > - basis for other technologies (sharding, replication into other DBMSs, > > ...) > > I agree with all of these goals except for "multi-master". I am not > sure that there is a need to have a multi-master replication solution > in core. The big tricky part of multi-master replication is conflict > resolution, and that seems like an awful lot of logic to try to build > into core - especially given that we will want it to be extensible. I don't plan to throw in loads of conflict resolution smarts. The aim is to get to the place where all the infrastructure is there so that a MM solution can be built by basically plugging in a conflict resolution mechanism. Maybe providing a very simple one. I think without in-core support its really, really hard to build a sensible MM implementation. Which doesn't mean it has to live entirely in core. Loads of the use-cases we have seen lately have a relatively small, low- conflict shared dataset and a far bigger sharded one. While that obviously isn't the only relevant use case it is a senible important one. > More generally, I would much rather see us focus on efficiently > extracting changesets from WAL and efficiently applying those > changesets on another server. IMHO, those are the things that are > holding back the not-in-core replication solutions we have today, > particularly the first. If we come up with a better way of applying > change-sets, well, Slony can implement that too; they are already > installing C code. What neither they nor any other non-core solution > can implement is change-set extraction, and therefore I think that > ought to be the focus. It definitely is a very important focus. I don't think it is the only one. But that doesn't seem to be a problem to me as long as everything is kept fairly modular (which I tried rather hard to). > To put all that another way, I think it is a 100% bad idea to try to > kill off Slony, Bucardo, Londiste, or any of the many home-grown > solutions that are out there to do replication. Even if there were no > technical place for third-party replication products (and I think > there is), we will not win many friends by making it harder to extend > and add on to the server. If we build an in-core replication solution > that can be used all by itself, that is fine with me. But I think it > should also be able to expose its constituent parts as a toolkit for > third-party solutions. I agree 100%. Unfortunately I forgot to explictly make that point, but the plan is definitely is to make the life of other replication solutions easier not harder. I don't think there will ever be one replication solution that fits every use-case perfectly. At pgcon I talked with some of the slony guys and they were definetly interested in the changeset generation and I have kept that in mind. If some problems that need resolving indepently of that issue is resolved (namely DDL) it shouldn't take much generating their output format. The 'apply' code is fully abstracted and separted. > > While you may argue that most of the above design goals are already > > provided by various trigger based replication solutions like Londiste or > > Slony, we think that thats not enough for various reasons: > > > > - not in core (and thus less trustworthy) > > - duplication of writes due to an additional log > > - performance in general (check the end of the above presentation) > > - complex to use because there is no native administration interface > > I think that your parenthetical note "(and thus less trustworthy)" > gets at another very important point, which is that one of the > standards for inclusion in core is that it must in fact be trustworthy > enough to justify the confidence that users will place in it. It will > NOT benefit the project to have two replication solutions in core, one > of which is crappy. More, even if what we put in core is AS GOOD as > the best third-party solutions that are available, I don't think > that's adequate. It has to be better. If it isn't, there is no > excuse for preempting what's already out there. I aggree that it has to be very good. *But* I think it is totally acceptable if it doesn't have all the bells and whistles from the start. That would be a sure road to disaster. For one implementing all that takes time and for another the amount of discussions till we are there is rather huge. > I imagine you are thinking along similar lines, but I think that it > bears being explicit about. Seems like were thinking along the same lines, yes. > > The biggest problem is, that interpreting tuples in the WAL stream > > requires an up-to-date system
Re: [HACKERS] libpq compression
On Thu, Jun 14, 2012 at 02:38:02PM -0500, Merlin Moncure wrote: > On Thu, Jun 14, 2012 at 1:43 PM, Tom Lane wrote: > > So I've got very little patience with the idea of "let's put in some > > hooks and then great things will happen". It would be far better all > > around if we supported exactly one, well-chosen, method. But really > > I still don't see a reason not to let openssl do it for us. > > Well, for toast compression the right choice is definitely one of the > lz based algorithms (not libz!). For transport compression you have > the case of sending large data over very slow and/or expensive links > in which case you want to use bzip type methods. But in the majority > of cases I'd probably be using lz there too. So if I had to pick just > one, there you go. But which one? the lz algorithm with arguably the > best pedigree (lzo) is gnu but there are many other decent candidates, > some of which have really tiny implementations. > > merlin > +1 for a very fast compressor/de-compressor. lz4 from Google has a BSD license and at 8.5X faster compression than zlib(-1) and 5X faster de-compression the zlib (-1), 2X faster than LZO even would be my pick. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Thu, Jun 14, 2012 at 1:43 PM, Tom Lane wrote: > So I've got very little patience with the idea of "let's put in some > hooks and then great things will happen". It would be far better all > around if we supported exactly one, well-chosen, method. But really > I still don't see a reason not to let openssl do it for us. Well, for toast compression the right choice is definitely one of the lz based algorithms (not libz!). For transport compression you have the case of sending large data over very slow and/or expensive links in which case you want to use bzip type methods. But in the majority of cases I'd probably be using lz there too. So if I had to pick just one, there you go. But which one? the lz algorithm with arguably the best pedigree (lzo) is gnu but there are many other decent candidates, some of which have really tiny implementations. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 3:20 PM, Robert Haas wrote: > Also, you had > the same primary content for both comments, and you incorrectly > reversed the sense of the FreeFile() test. > I am known for my fat fingers :) > Committed with those changes. > Thanks! -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] sortsupport for text
On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan wrote: > On 14 June 2012 19:28, Robert Haas wrote: >> I thought that doubling repeatedly would be overly aggressive in terms >> of memory usage. Blowing the buffers out to 8kB because we hit a >> string that's a bit over 4kB isn't so bad, but blowing them out to >> 256MB because we hit a string that's a bit over 128MB seems a bit >> excessive. > > That's pretty much what all popular dynamic array implementations do, > from C++'s std::vector to Python's list (it's a misnomer). Having to > allocate 256MB for a buffer to contain a string a bit over 128MB may > seem excessive, until you later get a 250MB string. Even if doubling > is generally excessive, which I doubt, that's beside the point, which > is that expanding the array by some constant proportion results in > each insertion taking amortized constant time. Yeah, but *it doesn't matter*. If you test this on strings that are long enough that they get pushed out to TOAST, you'll find that it doesn't measurably improve performance, because the overhead of detoasting so completely dominates any savings on the palloc side that you can't pick them out of the inter-run noise. Risking eating up an extra 100MB of memory that we don't really need in order to obtain a performance optimization that is far too small to measure does not make sense. The case with std::vector is not analagous; they don't have any way of knowing what other overhead you are incurring between insertions into the vector, so it's reasonable to support that the cost of the vector insertions themselves might be material. Here we know that it doesn't matter, so the application of Knuth's first law of optimization is appropriate. >> Suppose we expand the buffer a >> kilobyte at a time from an initial size of 1kB all the way out to >> 256MB. That's 256,000 palloc calls, so we must be sorting at least >> 256,000 datums, at least 128,000 of which are longer than 128MB. I >> think the cost of calling memcpy() and strcoll() repeatedly on all >> those long datums - not to mention repeatedly detoasting them - is >> going to bludgeon the palloc overhead into complete insignificance. > > I fail to understand how this sortsupport buffer fundamentally differs > from a generic dynamic array abstraction built to contain chars. That > being the case, I see no reason not to just do what everyone else does > when expanding dynamic arrays, and no reason why we shouldn't make > essentially the same time-space trade-off here as others do elsewhere. > > Another concern is that it seems fairly pointless to have two buffers. > Wouldn't it be more sensible to have a single buffer that was > partitioned to make two logical, equally-sized buffers, given that in > general each buffer is expected to grow at exactly the same rate? Sure, but it would be making the code more complicated in return for no measurable performance benefit. We generally avoid that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 14 June 2012 19:28, Robert Haas wrote: > I thought that doubling repeatedly would be overly aggressive in terms > of memory usage. Blowing the buffers out to 8kB because we hit a > string that's a bit over 4kB isn't so bad, but blowing them out to > 256MB because we hit a string that's a bit over 128MB seems a bit > excessive. That's pretty much what all popular dynamic array implementations do, from C++'s std::vector to Python's list (it's a misnomer). Having to allocate 256MB for a buffer to contain a string a bit over 128MB may seem excessive, until you later get a 250MB string. Even if doubling is generally excessive, which I doubt, that's beside the point, which is that expanding the array by some constant proportion results in each insertion taking amortized constant time. > Suppose we expand the buffer a > kilobyte at a time from an initial size of 1kB all the way out to > 256MB. That's 256,000 palloc calls, so we must be sorting at least > 256,000 datums, at least 128,000 of which are longer than 128MB. I > think the cost of calling memcpy() and strcoll() repeatedly on all > those long datums - not to mention repeatedly detoasting them - is > going to bludgeon the palloc overhead into complete insignificance. I fail to understand how this sortsupport buffer fundamentally differs from a generic dynamic array abstraction built to contain chars. That being the case, I see no reason not to just do what everyone else does when expanding dynamic arrays, and no reason why we shouldn't make essentially the same time-space trade-off here as others do elsewhere. Another concern is that it seems fairly pointless to have two buffers. Wouldn't it be more sensible to have a single buffer that was partitioned to make two logical, equally-sized buffers, given that in general each buffer is expected to grow at exactly the same rate? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 3:10 PM, Gurjeet Singh wrote: >> Well, according to the comments for AllocateFile: >> >> * fd.c will automatically close all files opened with AllocateFile at >> * transaction commit or abort; this prevents FD leakage if a routine >> * that calls AllocateFile is terminated prematurely by ereport(ERROR). > > I bet anyone else looking at this code, who is not in the know, will trip > over this again. > > Another problem with that code block is that it will throw "could not read" > even though read has succeeded but FreeFile() failed. > > I say we break it into two blocks, one to handle read error, and then close > the file separately. Also, either make sure FreeFile() is called in all code > paths, or not call FreeFile() at all and reference to the comment above > AllocateFile(). > > Patch attached. I agree with breaking it into two blocks, but I don't agree that the comments need to recapitulate how AllocateFile works. Also, you had the same primary content for both comments, and you incorrectly reversed the sense of the FreeFile() test. Committed with those changes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMENT on function's arguments
On Tue, Jun 12, 2012 at 10:59 PM, Vlad Arkhipov wrote: > Does it make sense to have a comment on function's arguments? Of course it > is possible to include these comments in a function's comment, but may be > better to have them in more formalized way like comments on columns of a > table. IDEs may use this information when providing hints for a function > like in other languages. This would be somewhat tricky, because our COMMENT support assumes that the object upon which we're commenting has an ObjectAddress, and individual arguments to a function don't, although perhaps the sub-object-id stuff that we currently use to handle comments on table columns could be extended to handle this case. I guess I wouldn't object to a well-done patch that made this work, but creating such a patch seems likely to be tricky, owing to the fact that there's nothing in the system that thinks of the individual arguments to a function as separate objects at present. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 2:02 PM, Robert Haas wrote: > On Thu, Jun 14, 2012 at 1:48 PM, Gurjeet Singh > wrote: > > On Thu, Jun 14, 2012 at 1:29 PM, Robert Haas > wrote: > >> Committed. > > > > > > A minor gripe: > > > > + /* > > + * Close the backup label file. > > + */ > > + if (ferror(lfp) || FreeFile(lfp)) { > > +ereport(ERROR, > > +(errcode_for_file_access(), > > +errmsg("could not read file \"%s\": %m", > > + BACKUP_LABEL_FILE))); > > + } > > + > > > > If ferror(lfp) returns false, wouldn't we miss the FreeFile() and leak a > > file pointer? > > Well, according to the comments for AllocateFile: > > * fd.c will automatically close all files opened with AllocateFile at > * transaction commit or abort; this prevents FD leakage if a routine > * that calls AllocateFile is terminated prematurely by ereport(ERROR). > I bet anyone else looking at this code, who is not in the know, will trip over this again. Another problem with that code block is that it will throw "could not read" even though read has succeeded but FreeFile() failed. I say we break it into two blocks, one to handle read error, and then close the file separately. Also, either make sure FreeFile() is called in all code paths, or not call FreeFile() at all and reference to the comment above AllocateFile(). Patch attached. Regards, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company FreeFile.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata
On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila wrote: > I am planning to work on the below Todo list item for this CommitFest > Allow WAL information to recover corrupted pg_controldata > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php The deadline for patches for this CommitFest is today, so I think you should target any work you're starting now for the NEXT CommitFest. > I wanted to confirm my understanding about the work involved for this patch: > The existing patch has following set of problems: > 1. Memory leak and linked list code path is not proper > 2. lock check for if the server is already running, is removed in patch > which needs to be reverted > 3. Refactoring of the code. > > Apart from above what I understood from the patch is that its intention is > to generate values for ControlFile using WAL logs when -r option is used. > > The change in algorithm from current will be if control file is corrupt > which essentialy means ReadControlFile() will return False, then it should > generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using > WAL if -r option is enabled. > > Also for -r option, it doesn't need to call function FindEndOfXLOG() as the > that work will be achieved by above point. > > It will just rewrite the control file and don’t do other resets. > > > The algorithm of restoring the pg_control value from old xlog file: > 1. Retrieve all of the active xlog files from xlog direcotry into a list > by increasing order, according their timeline, log id, segment id. > 2. Search the list to find the oldest xlog file of the lastest time line. > 3. Search the records from the oldest xlog file of latest time line to > the latest xlog file of latest time line, if the checkpoint record > has been found, update the latest checkpoint and previous checkpoint. > > > > Apart from above some changes in code will be required after the Xlog patch > by Heikki. > > Suggest me if my understanding is correct? I guess my first question is: why do we need this? There are lots of things in the TODO list that someone wanted once upon a time, but they're not all actually important. Do you have reason to believe that this one is? It's been six years since that email, so it's worth asking if this is actually relevant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund wrote: > This patch is problematic because formally indexes used by syscaches needs to > be unique, this one is not though because of 0/InvalidOids entries for > nailed/shared catalog entries. Those values aren't allowed to be queried > though. That's not the only reason it's not unique. Take a look at GetRelFileNode(). We really only guarantee that , taken as a four-tuple, is unique. You could have the same relfilenode in different tablespaces, or even within the same tablespace with different backend-IDs. The latter might not matter for you because you're presumably disregarding temp tables, but the former probably does. It's an uncommon scenario because we normally set relid = relfilenode, and of course relid is unique across the database, but the table gets rewritten then you end up with relid != relfilenode, and I don't think there's anything at that point that will prevent the new relfilenode from being chosen as some other relations relfilenode, as long as it's in a different tablespace. I think the solution may be to create a specialized cache for this, rather than relying on the general syscache infrastructure. You might look at, e.g., attoptcache.c for an example. That would allow you to build a cache that is aware of things like the relmapper infrastructure, and the fact that temp tables are ignorable for your purposes. But I think you will need to include at least the tablespace OID in the key along with the relfilenode to make it bullet-proof. I haven't read through the patch series far enough to know what this is being used for yet, but my fear is that you're using it to handle mapping a relflenode extracted from the WAL stream back to a relation OID. The problem with that is that relfilenode assignments obey transaction semantics. So, if someone begins a transaction, truncates a table, inserts a tuple, and commits, the heap_insert record is going to refer to a relfilenode that, according to the system catalogs, doesn't exist. This is similar to one of the worries in my other email, so I won't belabor the point too much more here... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
Euler Taveira writes: > On 14-06-2012 02:19, Tom Lane wrote: >> ... I especially think that importing bzip2 >> is a pretty bad idea --- it's not only a new dependency, but bzip2's >> compression versus speed tradeoff is entirely inappropriate for this >> use-case. > I see, the idea is that bzip2 would be a compiled-in option (not enabled by > default) just to give another compression option. I'm not particularly thrilled with "let's have more compression options just to have options". Each such option you add is another source of fail-to-connect incompatibilities (when either the client or the server doesn't have it). Moreover, while there are a lot of compression algorithms out there, a lot of them are completely unsuited for this use-case. If memory serves, bzip2 for example requires fairly large data blocks in order to get decent compression, which means you are either going to get terrible compression or suffer very bad latency when trying to apply it to a connection data stream. So I've got very little patience with the idea of "let's put in some hooks and then great things will happen". It would be far better all around if we supported exactly one, well-chosen, method. But really I still don't see a reason not to let openssl do it for us. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 2:05 PM, Fujii Masao wrote: > On Fri, Jun 15, 2012 at 2:29 AM, Robert Haas wrote: >> On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini >> wrote: >>> thank you very much for your patience (and thank you Marco for supporting >>> me). I apologise for the delay. >>> >>> I have retested the updated patch and it works fine with me. It is "ready >>> for committer" for me. >> >> Committed. > > I think the attached patch needs to be applied. Thanks, committed. (Somebody should really give you your own commit bit.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan wrote: > Why have you made the reusable buffer managed by sortsupport > TEXTBUFLEN-aligned? The existing rationale for that constant (whose > value is 1024) does not seem to carry forward here: > > * This should be large enough that most strings will be fit, but small > * enough that we feel comfortable putting it on the stack. Well, as the comments explain: + /* +* We avoid repeated palloc/pfree on long strings by keeping the buffers +* we allocate around for the duration of the sort. When we expand them, +* we round off the to the next multiple of TEXTBUFLEN in order to avoid +* repeatedly expanding them by very small amounts. +*/ > ISTM it would be on average worth the hit of having to repalloc a few > more times for larger strings by making that buffer much smaller > initially, and doubling its size each time that proved insufficient, > rather than increasing its size to the smallest possible > TEXTBUFLEN-aligned size that you can get away with for the immediately > subsequent memcpy. Realistically, any database I've ever worked with > would probably be able to fit a large majority of its text strings > into 16 chars of memory - you yourself said that sorting toasted text > isn't at all common. I thought that doubling repeatedly would be overly aggressive in terms of memory usage. Blowing the buffers out to 8kB because we hit a string that's a bit over 4kB isn't so bad, but blowing them out to 256MB because we hit a string that's a bit over 128MB seems a bit excessive. Also, I don't think it really saves anything from a performance point of view. The worst case for this is if we're asked to repeatedly compare strings that expand the buffer by a kilobyte each time. First, how likely is that to happen in a real world test case? In many cases, most of the input strings will be of approximately equal length; also in many cases, that length will be short; even if the lengths take the worst possible distribution, we have to hit them in the worst possible order for this to be a problem. Second, even if it does happen, does it really matter? Suppose we expand the buffer a kilobyte at a time from an initial size of 1kB all the way out to 256MB. That's 256,000 palloc calls, so we must be sorting at least 256,000 datums, at least 128,000 of which are longer than 128MB. I think the cost of calling memcpy() and strcoll() repeatedly on all those long datums - not to mention repeatedly detoasting them - is going to bludgeon the palloc overhead into complete insignificance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan wrote: > On 14 June 2012 17:35, Robert Haas wrote: >> The problem with pre-detoasting to save comparison cycles is that you >> can now fit many, many fewer tuples in work_mem. There might be cases >> where it wins (for example, because the entire data set fits even >> after decompressing everything) but in most cases it seems like a >> loser. > > If I had to guess, I'd say you're probably right about that - > optimising sorting toasted text doesn't seem like a terribly sensible > use of your time. > > What about the strxfrm suggestion of Greg's? You might find that the > added benefit of being able to avail of a highly optimised strcmp() > tipped the balance in favour of that idea, beyond the simple fact that > there's only a linear number of what you might loosely call "strcoll_l > units of work" rather than as many as O(n ^ 2). Furthermore, I'd > speculate that if you were to interlace the strxfrm() calls with > copying each text string, somewhere like within a specialised > datumCopy(), that would make the approach more efficient still, as you > specify a location for the blob in the just-palloc()'d leading-key > private memory directly, rather than just using memcpy. Well, it's still got the problem of blowing up memory usage. I just can't get excited about optimizing for the case where we can consume 10x the memory and still fit in work_mem. If we've got that case, the sort is gonna be pretty fast anyway. The case where preprocessing wins is when there are going to be a large number of comparisons against each tuple - i.e. lg(N) is large. But the cases where we could pre-transform with strxfrm are those where the data fits in a small percentage of work mem - i.e. lg(N) is small. I'm open to somebody showing up with a test result that demonstrates that it's worthwhile, but to me it seems like it's chasing diminishing returns. The point of this patch isn't really to improve things for the collation-aware case, although it's nice that it does. The point is rather to shave off a double-digit percentage off the time it takes to do the sort required to build a C-collation index, which is what people should be using when they don't care about < and >, which most don't. Despite Tom's concerns, I don't think there's anything in this patch that can't be fairly easily revised at a later date if we decide we want a different API. I think it's worth picking the low-hanging fruit in the meantime. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 2 March 2012 20:45, Robert Haas wrote: > I decided to investigate the possible virtues of allowing "text" to > use the sortsupport infrastructure, since strings are something people > often want to sort. I should mention up-front that I agree with the idea that it is worth optimising text sorting because it is a very common thing to have to do, and therefore the standard for inclusion ought to be lower. I don't intend to talk about tapesort though - that isn't really fair, not least because I have some serious doubts about the quality of our implementation. Furthermore, I think that it is logical that doing things like resolving collations occur within a preparatory function in advance of sorting, rather than redundantly doing that for each and every comparison. Why have you made the reusable buffer managed by sortsupport TEXTBUFLEN-aligned? The existing rationale for that constant (whose value is 1024) does not seem to carry forward here: * This should be large enough that most strings will be fit, but small * enough that we feel comfortable putting it on the stack. ISTM it would be on average worth the hit of having to repalloc a few more times for larger strings by making that buffer much smaller initially, and doubling its size each time that proved insufficient, rather than increasing its size to the smallest possible TEXTBUFLEN-aligned size that you can get away with for the immediately subsequent memcpy. Realistically, any database I've ever worked with would probably be able to fit a large majority of its text strings into 16 chars of memory - you yourself said that sorting toasted text isn't at all common. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Fri, Jun 15, 2012 at 2:29 AM, Robert Haas wrote: > On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini > wrote: >> thank you very much for your patience (and thank you Marco for supporting >> me). I apologise for the delay. >> >> I have retested the updated patch and it works fine with me. It is "ready >> for committer" for me. > > Committed. I think the attached patch needs to be applied. Regards, -- Fujii Masao doc_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 1:48 PM, Gurjeet Singh wrote: > On Thu, Jun 14, 2012 at 1:29 PM, Robert Haas wrote: >> >> On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini >> wrote: >> > thank you very much for your patience (and thank you Marco for >> > supporting >> > me). I apologise for the delay. >> > >> > I have retested the updated patch and it works fine with me. It is >> > "ready >> > for committer" for me. >> >> Committed. > > > A minor gripe: > > + /* > + * Close the backup label file. > + */ > + if (ferror(lfp) || FreeFile(lfp)) { > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not read file \"%s\": %m", > + BACKUP_LABEL_FILE))); > + } > + > > If ferror(lfp) returns false, wouldn't we miss the FreeFile() and leak a > file pointer? Well, according to the comments for AllocateFile: * fd.c will automatically close all files opened with AllocateFile at * transaction commit or abort; this prevents FD leakage if a routine * that calls AllocateFile is terminated prematurely by ereport(ERROR). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 14 June 2012 17:35, Robert Haas wrote: > The problem with pre-detoasting to save comparison cycles is that you > can now fit many, many fewer tuples in work_mem. There might be cases > where it wins (for example, because the entire data set fits even > after decompressing everything) but in most cases it seems like a > loser. If I had to guess, I'd say you're probably right about that - optimising sorting toasted text doesn't seem like a terribly sensible use of your time. What about the strxfrm suggestion of Greg's? You might find that the added benefit of being able to avail of a highly optimised strcmp() tipped the balance in favour of that idea, beyond the simple fact that there's only a linear number of what you might loosely call "strcoll_l units of work" rather than as many as O(n ^ 2). Furthermore, I'd speculate that if you were to interlace the strxfrm() calls with copying each text string, somewhere like within a specialised datumCopy(), that would make the approach more efficient still, as you specify a location for the blob in the just-palloc()'d leading-key private memory directly, rather than just using memcpy. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 1:29 PM, Robert Haas wrote: > On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini > wrote: > >thank you very much for your patience (and thank you Marco for > supporting > > me). I apologise for the delay. > > > >I have retested the updated patch and it works fine with me. It is > "ready > > for committer" for me. > > Committed. A minor gripe: + /* + * Close the backup label file. + */ + if (ferror(lfp) || FreeFile(lfp)) { +ereport(ERROR, +(errcode_for_file_access(), +errmsg("could not read file \"%s\": %m", + BACKUP_LABEL_FILE))); + } + If ferror(lfp) returns false, wouldn't we miss the FreeFile() and leak a file pointer? Regards, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] psql tab completion for GRANT role
On Sun, Jan 8, 2012 at 3:48 PM, Peter Eisentraut wrote: > psql tab completion currently only supports the form GRANT privilege ON > something TO someone (and the analogous REVOKE), but not the form GRANT > role TO someone. Here is a patch that attempts to implement the latter. This seems to have fallen through the cracks. It doesn't apply any more, but one general comment is that it seems undesirable to repeatedly recapitulate the list of all privileges that exist in the system. That's a lot of places that someone will have to find and fix when new privileges are added. But +1 on the general idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Namespace of array of user defined types is confused by the parser in insert?
On Mon, Apr 23, 2012 at 9:42 AM, Krzysztof Nienartowicz wrote: > Hello, > Sorry for re-posting - I initially posted this in pgsql.sql - probably > this group is more appropriate. > > I have a bizzare problem that started to manifest itself after > addition of field being the array of compound UDTs to the table > declared in multiple schemas. > It is clearly related to how the type namespace is resolved and shows > up for the JDBC client (probably related to the paramterized query, as > the static query works without problems). I'm replying to this awfully late, but I'm guessing this is some kind of JDBC magic, not anything that PostgreSQL is causing directly. You might want to post to pgsql-jdbc, if you haven't already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini wrote: > thank you very much for your patience (and thank you Marco for supporting > me). I apologise for the delay. > > I have retested the updated patch and it works fine with me. It is "ready > for committer" for me. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Thu, Jun 14, 2012 at 5:38 PM, Robert Haas wrote: > On Thu, Jun 14, 2012 at 11:59 AM, Dave Page wrote: >> On Thu, Jun 14, 2012 at 11:43 AM, Dave Page wrote: >>> >>> I'll have a play with it and see if a simple switch to NetworkService >>> seems feasible. >> >> OK, I worked up a patch which uses "NT AUTHORITY\NetworkService" as >> the service account by default. This doesn't need a password, so >> allows us to simply prompt during installation for the superuser >> password for the cluster, and not at all during upgrade. If you run >> the installer from the command line with "--serviceaccount postgres" >> (or some other account name), you get the current behaviour. >> >> I've posted it on our internal ReviewBoard system for the rest of the >> team to review and test on various platforms (I've only tried it on XP >> so far). Now would be a very good time for people to yelp if they >> think this is a bad idea (the only downside I can see if accessing >> files for server-side copy etc, but users that want to do that can >> install with a non-default account). If we go ahead, we'll include it >> in 9.2. > > What happens if they try to use this to upgrade from the EDB 9.1 installers? The installers don't support major version upgrades - it'll install alongside 9.1. If someone has an existing beta installation of 9.2, it'll use the service account that was installed with, just as past minor-version upgrades would (including asking for the password). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Thu, Jun 14, 2012 at 11:59 AM, Dave Page wrote: > On Thu, Jun 14, 2012 at 11:43 AM, Dave Page wrote: >> >> I'll have a play with it and see if a simple switch to NetworkService >> seems feasible. > > OK, I worked up a patch which uses "NT AUTHORITY\NetworkService" as > the service account by default. This doesn't need a password, so > allows us to simply prompt during installation for the superuser > password for the cluster, and not at all during upgrade. If you run > the installer from the command line with "--serviceaccount postgres" > (or some other account name), you get the current behaviour. > > I've posted it on our internal ReviewBoard system for the rest of the > team to review and test on various platforms (I've only tried it on XP > so far). Now would be a very good time for people to yelp if they > think this is a bad idea (the only downside I can see if accessing > files for server-side copy etc, but users that want to do that can > install with a non-default account). If we go ahead, we'll include it > in 9.2. What happens if they try to use this to upgrade from the EDB 9.1 installers? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan wrote: > On 18 March 2012 15:08, Tom Lane wrote: >> One other thing I've always wondered about in this connection is the >> general performance of sorting toasted datums. Is it better to detoast >> them in every comparison, or pre-detoast to save comparison cycles at >> the cost of having to push much more data around? I didn't see any >> discussion of this point in Robert's benchmarks, but I don't think we >> should go very far towards enabling sortsupport for text until we >> understand the issue and know whether we need to add more infrastructure >> for it. If you cross your eyes a little bit, this is very much like >> the strxfrm question... > > I see the parallels. The problem with pre-detoasting to save comparison cycles is that you can now fit many, many fewer tuples in work_mem. There might be cases where it wins (for example, because the entire data set fits even after decompressing everything) but in most cases it seems like a loser. Also, my guess is that most values people sort by are pretty short, making this concern mostly academic. Suppose you are sorting a bunch of strings which might be either 100 characters in length or 1MB. If they're all 100 characters, you probably don't need to detoast. If they're all 1MB, you probably can't detoast without eating up a ton of memory (and even if you have it, this might not be the best use for it). If you have a mix, detoasting might be affordable provided that the percentage of long strings is small, but it's also not going to save you much, because if the percentage of long strings is small, then most comparisons will be between two short strings where we don't save anything anyway. All things considered, this seems to me to be aiming at a pretty narrow target, but maybe I'm just not thinking about it creatively enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture
On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund wrote: > === Design goals for logical replication === : > - in core > - fast > - async > - robust > - multi-master > - modular > - as unintrusive as possible implementation wise > - basis for other technologies (sharding, replication into other DBMSs, ...) I agree with all of these goals except for "multi-master". I am not sure that there is a need to have a multi-master replication solution in core. The big tricky part of multi-master replication is conflict resolution, and that seems like an awful lot of logic to try to build into core - especially given that we will want it to be extensible. More generally, I would much rather see us focus on efficiently extracting changesets from WAL and efficiently applying those changesets on another server. IMHO, those are the things that are holding back the not-in-core replication solutions we have today, particularly the first. If we come up with a better way of applying change-sets, well, Slony can implement that too; they are already installing C code. What neither they nor any other non-core solution can implement is change-set extraction, and therefore I think that ought to be the focus. To put all that another way, I think it is a 100% bad idea to try to kill off Slony, Bucardo, Londiste, or any of the many home-grown solutions that are out there to do replication. Even if there were no technical place for third-party replication products (and I think there is), we will not win many friends by making it harder to extend and add on to the server. If we build an in-core replication solution that can be used all by itself, that is fine with me. But I think it should also be able to expose its constituent parts as a toolkit for third-party solutions. > While you may argue that most of the above design goals are already provided > by > various trigger based replication solutions like Londiste or Slony, we think > that thats not enough for various reasons: > > - not in core (and thus less trustworthy) > - duplication of writes due to an additional log > - performance in general (check the end of the above presentation) > - complex to use because there is no native administration interface I think that your parenthetical note "(and thus less trustworthy)" gets at another very important point, which is that one of the standards for inclusion in core is that it must in fact be trustworthy enough to justify the confidence that users will place in it. It will NOT benefit the project to have two replication solutions in core, one of which is crappy. More, even if what we put in core is AS GOOD as the best third-party solutions that are available, I don't think that's adequate. It has to be better. If it isn't, there is no excuse for preempting what's already out there. I imagine you are thinking along similar lines, but I think that it bears being explicit about. > As we need a change stream that contains all required changes in the correct > order, the requirement for this stream to reflect changes across multiple > concurrent backends raises concurrency and scalability issues. Reusing the > WAL stream for this seems a good choice since it is needed anyway and adresses > those issues already, and it further means that we don't incur duplicate > writes. Any other stream generating componenent would introduce additional > scalability issues. Agreed. > We need a change stream that contains all required changes in the correct > order > which thus needs to be synchronized across concurrent backends which > introduces > obvious concurrency/scalability issues. > Reusing the WAL stream for this seems a good choice since it is needed anyway > and adresses those issues already, and it further means we don't duplicate the > writes and locks already performance for its maintenance. Agreed. > Unfortunately, in this case, the WAL is mostly a physical representation of > the > changes and thus does not, by itself, contain the necessary information in a > convenient format to create logical changesets. Agreed. > The biggest problem is, that interpreting tuples in the WAL stream requires an > up-to-date system catalog and needs to be done in a compatible backend and > architecture. The requirement of an up-to-date catalog could be solved by > adding more data to the WAL stream but it seems to be likely that that would > require relatively intrusive & complex changes. Instead we chose to require a > synchronized catalog at the decoding site. That adds some complexity to use > cases like replicating into a different database or cross-version > replication. For those it is relatively straight-forward to develop a proxy pg > instance that only contains the catalog and does the transformation to textual > changes. The actual requirement here is more complex than "an up-to-date catalog". Suppose transaction X begins, adds a column to a table, inserts a row, and commits. That tuple needs to be interpreted using th
Re: [HACKERS] libpq compression
On Thu, Jun 14, 2012 at 9:57 AM, Phil Sorber wrote: > It doesn't sound like there is a lot of support for this idea, but I > think it would be nice to get something like lz4 > (http://code.google.com/p/lz4/) or snappy > (http://code.google.com/p/snappy/) support. Both are BSD-ish licensed. > It could be useful for streaming replication as well. A hook (as Euler > mentioned) might be a nice compromise. There is a lot of support for the idea: it's one of the more requested features. I think a well thought out framework that bypassed the dependency issues via plugging might get some serious traction. Emphasis 'on well thought out' :-). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Thu, Jun 14, 2012 at 11:43 AM, Dave Page wrote: > > I'll have a play with it and see if a simple switch to NetworkService > seems feasible. OK, I worked up a patch which uses "NT AUTHORITY\NetworkService" as the service account by default. This doesn't need a password, so allows us to simply prompt during installation for the superuser password for the cluster, and not at all during upgrade. If you run the installer from the command line with "--serviceaccount postgres" (or some other account name), you get the current behaviour. I've posted it on our internal ReviewBoard system for the rest of the team to review and test on various platforms (I've only tried it on XP so far). Now would be a very good time for people to yelp if they think this is a bad idea (the only downside I can see if accessing files for server-side copy etc, but users that want to do that can install with a non-default account). If we go ahead, we'll include it in 9.2. Thanks for the feedback folks. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allow WAL information to recover corrupted pg_controldata
I am planning to work on the below Todo list item for this CommitFest Allow WAL information to recover corrupted pg_controldata http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php I wanted to confirm my understanding about the work involved for this patch: The existing patch has following set of problems: 1. Memory leak and linked list code path is not proper 2. lock check for if the server is already running, is removed in patch which needs to be reverted 3. Refactoring of the code. Apart from above what I understood from the patch is that its intention is to generate values for ControlFile using WAL logs when -r option is used. The change in algorithm from current will be if control file is corrupt which essentialy means ReadControlFile() will return False, then it should generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using WAL if -r option is enabled. Also for -r option, it doesn't need to call function FindEndOfXLOG() as the that work will be achieved by above point. It will just rewrite the control file and don't do other resets. The algorithm of restoring the pg_control value from old xlog file: 1. Retrieve all of the active xlog files from xlog direcotry into a list by increasing order, according their timeline, log id, segment id. 2. Search the list to find the oldest xlog file of the lastest time line. 3. Search the records from the oldest xlog file of latest time line to the latest xlog file of latest time line, if the checkpoint record has been found, update the latest checkpoint and previous checkpoint. Apart from above some changes in code will be required after the Xlog patch by Heikki. Suggest me if my understanding is correct?
Re: [HACKERS] sortsupport for text
On 18 March 2012 15:08, Tom Lane wrote: > One other thing I've always wondered about in this connection is the > general performance of sorting toasted datums. Is it better to detoast > them in every comparison, or pre-detoast to save comparison cycles at > the cost of having to push much more data around? I didn't see any > discussion of this point in Robert's benchmarks, but I don't think we > should go very far towards enabling sortsupport for text until we > understand the issue and know whether we need to add more infrastructure > for it. If you cross your eyes a little bit, this is very much like > the strxfrm question... I see the parallels. I note that glibc's strcoll_l() is implemented entirely in C (strcoll() itself is implemented in terms of strcoll_l() ), whereas the various strcmp.S are written in hand-optimized assembler, with SSE3 instructions in the "Highly optimized version for x86-64", for example. I wonder just how important a factor that is. I suppose the reason why the glibc guys haven't just done something equivalent internally might be that they much prefer to perform the comparison in-place, due to the need to target a conservative lowest common denominator...or it could be because it just doesn't matter that much. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}
On Thursday, June 14, 2012 04:04:22 PM Robert Haas wrote: > On Thu, Jun 14, 2012 at 9:57 AM, Andres Freund wrote: > > On Thursday, June 14, 2012 03:50:28 PM Robert Haas wrote: > >> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund > > > > wrote: > >> > This is locally defined in lots of places and would get introduced > >> > frequently in the next commits. It is expected that this can be > >> > defined in a header-only manner as soon as the XLogInsert scalability > >> > groundwork from Heikki gets in. > >> > >> This appears to be redundant with the existing InvalidXLogRecPtr, > >> defined in access/transam.h. > > > > Oh. I didn't find that one. Judging from all the code defining local > > variants of it I am not alone in that... Why is it in transam.h and not > > xlogdefs.h? > > Uh, not sure. We used to have a variable by that name defined in a > bunch of places, and I cleaned it up some in commit > 503c7305a1e379f95649eef1a694d0c1dbdc674a. But if there are still more > redundant definitions floating around, it would be nice to clean those > up. Forget it, they are in things that don't link to the backend... /me looks forward to the 64bit conversion of XLogRecPtr's. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
On Thu, Jun 14, 2012 at 10:45 AM, Tom Lane wrote: > Robert Haas writes: >> Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it >> just for extensions? > > I'm too lazy to go look, but it certainly ought to be in use. > The idea is that that's the phase for post-lock-release cleanup, > and anything that can possibly be postponed till after releasing > locks certainly should be ... Oh, you're right. I missed the logic in ResourceOwnerReleaseInternal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Thu, Jun 14, 2012 at 10:14 AM, Florian Pflug wrote: > On Jun14, 2012, at 15:28 , Euler Taveira wrote: >> On 14-06-2012 02:19, Tom Lane wrote: >>> I still think that pushing this off to openssl (not an ssh tunnel, but >>> the underlying transport library) would be an adequate solution. >>> If you are shoving data over a connection that is long enough to need >>> compression, the odds that every bit of it is trustworthy seem pretty >>> small, so you need encryption too. >>> >> I don't want to pay the SSL connection overhead. Also I just want >> compression, >> encryption is not required. OpenSSL give us encryption with/without >> compression; we need an option to obtain compression in non-SSL connections. > > AFAIR, openssl supports a NULL cipher which doesn't do any encryption. We > could have a connection parameter, say compress=on, which selects that > cipher (unless sslmode is set to prefer or higher, of course). > > SSL NULL-chipher connections would be treated like unencrypted connections > when matching against pg_hba.conf. > > best regards, > Florian Pflug > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers It doesn't sound like there is a lot of support for this idea, but I think it would be nice to get something like lz4 (http://code.google.com/p/lz4/) or snappy (http://code.google.com/p/snappy/) support. Both are BSD-ish licensed. It could be useful for streaming replication as well. A hook (as Euler mentioned) might be a nice compromise. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Robert Haas writes: > Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it > just for extensions? I'm too lazy to go look, but it certainly ought to be in use. The idea is that that's the phase for post-lock-release cleanup, and anything that can possibly be postponed till after releasing locks certainly should be ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup docs
On Wed, Jun 13, 2012 at 3:20 PM, Dimitri Fontaine wrote: > Please let's apply that documentation patch to 9.2 too. Agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
On Sun, Jun 10, 2012 at 5:37 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch wrote: >>> Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have >>> reviewed the code that runs between the old and new call sites, and I did >>> not >>> identify a hazard of moving it as you describe. > >> I looked at this when we last discussed it and didn't see a problem >> either, so I tend to agree that we ought to go ahead and do this, > > +1, as long as you mean in 9.3 not 9.2. I don't see any risk either, > but the time for taking new risks in 9.2 is past. > > Noah, please add this patch to the upcoming CF, if you didn't already. I re-reviewed this and committed it. Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it just for extensions? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
> That seems undesirable. I don't think this is important enough to be worth reparsing > the connection string for. The connect string should not be long, so parsing is not a big cost performance wise. I have currently modified the code for dblink in the patch I have uploaded in CF. However as per your suggestion, I will remove it during handling of other Review comments for patch unless somebody asks to keep it. -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Thursday, June 14, 2012 7:25 PM To: Amit Kapila Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila wrote: > >> Why not 'dblink'? > > We can do for dblink as well. I just wanted to check before implementing in > dblink. > > I have checked the dblink_connect() function, it receives the connect string > and used in most cases that string to > call libpq connect which is different from pgbench and oid2name where > connection parameters are formed in main function and then call libpq > connect. > > To achieve the same in dblink, we need to parse the passed connection string > and check if it contains fallback_application_name, if yes then its okay, > otherwise we need to append fallback_application_name in connection string. That seems undesirable. I don't think this is important enough to be worth reparsing the connection string for. I'd just forget about doing it for dblink if there's no cheaper way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On Jun14, 2012, at 15:28 , Euler Taveira wrote: > On 14-06-2012 02:19, Tom Lane wrote: >> I still think that pushing this off to openssl (not an ssh tunnel, but >> the underlying transport library) would be an adequate solution. >> If you are shoving data over a connection that is long enough to need >> compression, the odds that every bit of it is trustworthy seem pretty >> small, so you need encryption too. >> > I don't want to pay the SSL connection overhead. Also I just want compression, > encryption is not required. OpenSSL give us encryption with/without > compression; we need an option to obtain compression in non-SSL connections. AFAIR, openssl supports a NULL cipher which doesn't do any encryption. We could have a connection parameter, say compress=on, which selects that cipher (unless sslmode is set to prefer or higher, of course). SSL NULL-chipher connections would be treated like unencrypted connections when matching against pg_hba.conf. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}
On Thu, Jun 14, 2012 at 9:57 AM, Andres Freund wrote: > On Thursday, June 14, 2012 03:50:28 PM Robert Haas wrote: >> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund > wrote: >> > This is locally defined in lots of places and would get introduced >> > frequently in the next commits. It is expected that this can be defined >> > in a header-only manner as soon as the XLogInsert scalability groundwork >> > from Heikki gets in. >> >> This appears to be redundant with the existing InvalidXLogRecPtr, >> defined in access/transam.h. > Oh. I didn't find that one. Judging from all the code defining local variants > of it I am not alone in that... Why is it in transam.h and not xlogdefs.h? Uh, not sure. We used to have a variable by that name defined in a bunch of places, and I cleaned it up some in commit 503c7305a1e379f95649eef1a694d0c1dbdc674a. But if there are still more redundant definitions floating around, it would be nice to clean those up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating objects in pg_catalog
On Thu, Jun 7, 2012 at 8:44 AM, Robert Haas wrote: > On Wed, Jun 6, 2012 at 5:21 PM, Tom Lane wrote: >> Robert Haas writes: >>> rhaas=# create table pg_catalog.tom (a int); >>> ERROR: permission denied to create "pg_catalog.tom" >> >>> The offending error check is in heap_create(), and based on what >>> you're saying here it seems like we should just rip it out. >> >> Hmm. Yeah, it seems like the regular permissions tests on the schemas >> in question should be enough to keep Joe User from making tables there, >> and I do not see a reason why the backend would care if there are >> non-catalog tables laying about in pg_catalog. >> >> Checking the commit history, it seems this was originally a test to >> prevent people from creating tables named "pg_xxx": >> >> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f5a10e722c052006886b678995695001958a#patch3 >> >> which may or may not have been critical once upon a time, but surely is >> not any more. >> >> So no objection to removing that particular test. > > Patch attached. Barring objections, I'll apply this to 9.3 once we branch. Hearing no objections, done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}
On Thursday, June 14, 2012 03:50:28 PM Robert Haas wrote: > On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund wrote: > > This is locally defined in lots of places and would get introduced > > frequently in the next commits. It is expected that this can be defined > > in a header-only manner as soon as the XLogInsert scalability groundwork > > from Heikki gets in. > > This appears to be redundant with the existing InvalidXLogRecPtr, > defined in access/transam.h. Oh. I didn't find that one. Judging from all the code defining local variants of it I am not alone in that... Why is it in transam.h and not xlogdefs.h? Obviously that patch is void then. Doesn't warrant rebasing the other patches yet though... Thanks! Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila wrote: > >> Why not 'dblink'? > > We can do for dblink as well. I just wanted to check before implementing in > dblink. > > I have checked the dblink_connect() function, it receives the connect string > and used in most cases that string to > call libpq connect which is different from pgbench and oid2name where > connection parameters are formed in main function and then call libpq > connect. > > To achieve the same in dblink, we need to parse the passed connection string > and check if it contains fallback_application_name, if yes then its okay, > otherwise we need to append fallback_application_name in connection string. That seems undesirable. I don't think this is important enough to be worth reparsing the connection string for. I'd just forget about doing it for dblink if there's no cheaper way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Restrict ALTER FUNCTION CALLED ON NULL INPUT (was Re: Not quite a security hole: CREATE LANGUAGE for non-superusers)
On Tue, Jun 12, 2012 at 03:13:26PM -0400, Tom Lane wrote: > Even if I accepted that premise, this patch is a pretty bad > implementation of it, because it restricts cases that there is no > reason to think are unsafe. > > A less bizarre and considerably more future-proof restriction, IMO, > would simply refuse any attempt to give ownership of a C function > to a non-superuser. That just moves the collateral damage to a different set of victims. Hardcoding the list of vulnerable languages isn't so "future-proof". I'd otherwise agree in principle if we were designing a system from scratch, but it doesn't fit with the need to harmoniously protect existing systems. Adding this restriction now would make some existing databases fail to restore from dumps. That is grave enough at any juncture, let alone for a backpatched fix. On Tue, Jun 12, 2012 at 04:14:48PM -0400, Tom Lane wrote: > Basically, if we go down the road Noah is proposing, I foresee a steady > stream of security bugs and ensuing random restrictions on what function > owners can do. I do not like that future. Then let's have every ALTER FUNCTION require the same language usage check as CREATE FUNCTION. Or, if you insist, do so only for the hardcoded cases of INTERNALlanguageId and ClanguageId, and document that no third-party PL may rely on STRICT to the extent they do. This of course forbids more than necessary but still strictly less than your proposal of blocking the original ownership change. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund wrote: > This is locally defined in lots of places and would get introduced frequently > in the next commits. It is expected that this can be defined in a header-only > manner as soon as the XLogInsert scalability groundwork from Heikki gets in. This appears to be redundant with the existing InvalidXLogRecPtr, defined in access/transam.h. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \conninfo and SSL
On Sun, Jun 3, 2012 at 5:30 AM, Alastair Turner wrote: > A one-line change adds the SSL info on its own line like > > -- > You are connected to database "scratch" as user "scratch" on host > "127.0.0.1" at port "5432". > SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) > -- > > Does this need a more integrated presentation, and therefore a broader > change to make it translatable? Committed to master. I didn't make it conditional on a non-local connection, though, since there seemed to be no reason to it that way. P.S. Email mangles patches. It's better to attach them rather than including them inline. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
On 14-06-2012 02:19, Tom Lane wrote: > I still think that pushing this off to openssl (not an ssh tunnel, but > the underlying transport library) would be an adequate solution. > If you are shoving data over a connection that is long enough to need > compression, the odds that every bit of it is trustworthy seem pretty > small, so you need encryption too. > I don't want to pay the SSL connection overhead. Also I just want compression, encryption is not required. OpenSSL give us encryption with/without compression; we need an option to obtain compression in non-SSL connections. > We do need the ability to tell openssl to use compression. We don't > need to implement it ourselves, nor to bring a bunch of new library > dependencies into our builds. I especially think that importing bzip2 > is a pretty bad idea --- it's not only a new dependency, but bzip2's > compression versus speed tradeoff is entirely inappropriate for this > use-case. > I see, the idea is that bzip2 would be a compiled-in option (not enabled by default) just to give another compression option. I don't have a strong opinion about including it as another dependency. We already depend on zlib and implementing compression using it won't add another dependency. What do you think about adding a hook at libpq to load an extension that does the compression? That way we don't add another dependency at libpq and also a lot of extensions could be coded to cover a variety of algorithms without putting us in trouble because of patent infringement. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ability to listen on two unix sockets
On Thu, Jun 14, 2012 at 1:04 AM, Tom Lane wrote: > Robert Haas writes: >> Maybe: > >> listen_addresses = { host | host:port | * | *:port } [, ...] >> unix_socket_directory = { directory | directory:port ] [,...] > >> ...except that colon is a valid character in a directory name. Not >> sure what to do about that. > > Do we need to do anything? There are no standard scenarios in which a > colon would appear in such paths, and I don't see why we need to cater > for it. (Remember that Windows doesn't enter into this ...) True. And, we should be able to design the parsing algorithm so that they can fix it by adding an otherwise-redundant port specification - i.e. if you want to put the socket in a directory called /me:1, then write /me:1:5432 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
Il giorno mer, 13/06/2012 alle 18.07 -0400, Noah Misch ha scritto: > On Wed, Jun 13, 2012 at 10:12:18PM +0200, Gabriele Bartolini wrote: > > Our goal is to work on this patch from the next commit fest. > > > > What we are about to do for this commit fest is to split the previous > > patch and send a small one just for the array_remove() and > > array_replace() functions. > > > > Then we will sit down and organise the development of the feature > > according to Peter's feedback. It is important indeed that we find a > > commonly accepted terminology and syntax for this feature. > > > > I hope this sounds like a reasonable plan. Thank you. > > Sounds fine. The 2012-01 CF entry for this patch has been moved to the > 2012-06 CF. Please mark that entry Returned with Feedback and create a new > entry for the subset you're repackaging for this CommitFest. > > Thanks, > nm > Dear Noah, I have added a new patch to the commitfest (miscellaneous category) entitled "Support for array_remove and array_replace functions". I have also marked the "Foreign Key Array" patch as " Returned with Feedback" as per your request. We will start working again on this patch in the next commitfest as anticipated by Gabriele, hoping that the array functions will be committed by then. Thank you. Cheers, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimising windows installer password confusion
On Thu, Jun 14, 2012 at 12:55 AM, Craig Ringer wrote: > On 06/13/2012 05:14 PM, Dave Page wrote: >> >> On Wed, Jun 13, 2012 at 2:18 AM, Craig Ringer >> wrote: >>> >>> On 06/12/2012 08:08 PM, Dave Page wrote: Some background: By default the installer will use 'postgres' for both the service (OS) account, and the database superuser account. It will use the same password for both (though, users have complete control at the command line if they want it, which is why the account names are shown in the installer). Unlike *nix installations, the service account must have a password, which is required during both installation and upgrade to configure the PostgreSQL service. We therefore ask for the password during both installation and upgrade. If the user account exists (which can happen if there has previously been an installation of Postgres on the system), the user must specify the existing password for the account during installation (and of course, during all upgrades). This is where users normally get stuck, as they've forgotten the password they set in the past. >>> >>> They may not even have set it, because the EnterpriseDB installer can be >>> run >>> unattended and may've been used by some 3rd party package. >>> >>> I'd be interested in seeing what Microsoft installers that create >>> isolated >>> user accounts do. I think .NET creates one for ASP, so that'd be one >>> option >>> to look into. >> >> They tend to use the localsystem or networkservice accounts for most >> things, which don't have passwords. The reason we don't do that is >> that since the early days of 8.0 we've said we want to enable users to >> use the service account as if it were a regular account, so that they >> can do things like access network resources (useful for server-side >> copy for example). >> >> I wasn't overly convinced back then that that was necessary, and I'm >> still not now. I suppose we potentially could use the networkservice >> account by default, and let advanced users override that on the >> command line if they need to. Then remembering the password becomes >> their responsibility. > > +1 from me on this, though I think making the service account part of the > install process makes sense. > > SERVICE ACCOUNT > - > > You can probably just accept the default here. > > PostgreSQL runs in the background as a network service in a user account > that > only has limited access to the files and programs on the computer. This is > fine > for most purposes, but will prevent certain operations like server-side COPY > and > direct access by the server to resources on shared folders from working. If > you > need these capabilities, we recommend that you create a "postgres" user > account > below and have the PostgreSQL server run in that instead of the default > NetworkService account. > > -- [service account] --- > | | > | [*] LocalService | > | | > | [ ] Custom Service Account | > | | > | -- [custom account]--| | > | | | | > | | Username: [ ] | | > | | Password: [ ] | | > | | Domain: [ THISCOMPUTER ] | | > | | | | > | || | > |--| > > Reasonable option? Too complicated - it'll confuse users too (and it goes against the primary goal of the installers which is to setup the server in a suitable way for production use, with the absolute minimum of user interaction). We try to provide more complex config options that 99% of users won't need through the command line only (though, in the future I guess it might make sense to have a "Show advanced options" checkbox on the first page, though that's definitely out of scope for 9.2). I'll have a play with it and see if a simple switch to NetworkService seems feasible. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Support for array_remove and array_replace functions
Hi, following Gabriele's email regarding our previous patch on "Foreign Key Arrays"[1], I am sending a subset of that patch which includes only two array functions which will be needed in that patch: array_remove (limited to single-dimensional arrays) and array_replace. The patch includes changes to the documentation. Cheers, Marco [1] http://archives.postgresql.org/message-id/4FD8F422.40709%402ndQuadrant.it -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it array-functions.patch.bz2 Description: application/bzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
Hello Gilles, thank you very much for your patience (and thank you Marco for supporting me). I apologise for the delay. I have retested the updated patch and it works fine with me. It is "ready for committer" for me. Cheers, Gabriele Il 14/06/12 11:36, Marco Nenciarini ha scritto: Hi Gilles, unfortunately Gabriele has been very busy recently and he asked me to check again the status of this patch for this commit fest. In order to speed up the application of the patch, I am sending an updated version which correctly compiles with current head. Gabriele will then proceed with the review. Thank you, Marco This body part will be downloaded on demand. -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it
Re: [HACKERS] Patch pg_is_in_backup()
Hi Gilles, unfortunately Gabriele has been very busy recently and he asked me to check again the status of this patch for this commit fest. In order to speed up the application of the patch, I am sending an updated version which correctly compiles with current head. Gabriele will then proceed with the review. Thank you, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it pg_backup_start_time-and-pg_is_in_backup-v4.1.patch.bz2 Description: application/bzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Tom Lane writes: > Well, TBH I didn't think that concept was a useful solution for this at > all. I can't imagine that we would define "features" at a sufficiently > fine granularity, or with enough advance foresight, to solve the sort of > problem that's being faced here. How would you deal with the need for, > say, some of contrib/xml2's functions to get migrated to core in 9.4 or > so? When you didn't know that would happen, much less exactly which > functions, when 9.3 came out? AFAICS the only way that "features" could You can provide and require new feature names on the same dependency graph, and the patch allows us to do so without foresight. The trade off here is that we would need to edit the previous version of PostgreSQL to add some new feature names. In your example we would add a feature name to the contrib/xmls extensions in a 9.3 minor release and add the same feature to 9.4 core distribution. That requires minor upgrade before major upgrade for us to have a chance to deal with lack of foresight. > fix this would be if we always created a feature for every exposed > function, which is unmanageable. Agreed, exposing each function as a feature is not the way to go. > AFAICS, the SQL-standard features list is just about entirely irrelevant > to this discussion. How often is it going to happen that we implement a > standard feature in a contrib module before migrating it into core? I wanted the standard to help me with the "core features" idea, I agree it's not a smart move here. > I think almost every case in which we'll face this problem will involve > a PG-specific feature not mentioned in the SQL feature taxonomy. The > case at hand (some proposed new functions for managing replication) > certainly isn't going to be found there. Agreed. > And, quite aside from whether we could invent feature names that match > what we want to move from contrib to core, exactly how would having a > feature name help? The problem we're actually facing is getting > pg_upgrade to not dump particular functions when it's doing a > binary-upgrade dump of an extension. Maybe I've forgotten, but I do not > recall any exposed connection between feature names and particular SQL > objects in your proposal. We're talking about several distinct problems here, one is low level upgrade mechanism to keep OIDs and the other is about upgrading to a newer version of the same extension, which the content changed. And we want to do both, of course. The idea would be to implement the upgrade as you're proposing, or at least my understanding of it, which is to just issue a create extension command as part of the schema creation. That will run the new extension script which will know not to install deprecated functions. The problem with that is to conserve OIDs that might be stored in user relations, such as types. If we want pg_upgrade to cope with upgrading an extension to a new content, we have to change its implementation. An idea would be to add hooks in the backend code at OID attribution time so that pg_upgrade can attach code that will provide the OID. As input it should have creating_extension, extension name, schema name and object name, and also object "argument list" for more complex things such as functions and aggregates. I don't think the "extension features" dependency management patch should be fixing the running the newer extension script while keeping existing OID list by itself. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq compression
Euler Taveira wrote: > There was already some discussion about compressing libpq data [1][2][3]. > Recently, I faced a scenario that would become less problematic if we have had > compression support. The scenario is frequent data load (aka COPY) over > slow/unstable links. It should be executed in a few hundreds of PostgreSQL > servers all over Brazil. Someone could argue that I could use ssh tunnel to > solve the problem but (i) it is complex because it involves a different port > in the firewall and (ii) it's an opportunity to improve other scenarios like > reducing bandwidth consumption during replication or normal operation over > slow/unstable links. Maybe I'm missing something obvious, but shouldn't a regular SSL connection (sslmode=require) do what you are asking for? At least from OpenSSL 0.9.8 on, data get compressed by default. You don't need an extra port in the firewall for that. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hint bit i/o reduction
> yeah -- but you still have to call: > *) TransactionIdIsCurrentTransactionId(), which adds a couple of tests > (and a bsearch in the presence of subtransactions) > *) TransactionIdIsInProgress() which adds a few tests and a out of line call in the typical case > *) TransactionIdDidCommit() which adds a few tests and two out of line calls in the typical case and, while setting the hint it: > *) Several more tests plus: > *) TransactionIdGetCommitLSN() another out of line call and a test > *) XLogNeedsFlush() two more out of line calls plus a few tests > *) SetBufferCommitInfoNeedsSave out of line call, several more tests >adding a few instructions to the above probably won't be measurable You are right, adding in above path should not impact any performance and moreover all the checks and assignment you are purposing is backend local so no contention also. Just an observation that in some cases, it will just do only point-1 or point-1 and point-2. For example 1. xmin is committed and hint bit is already set. 2. xmax is hint bit is not set, and its still not visible to current transaction. In this case it will just do point-1 and point-2. -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Wednesday, June 13, 2012 8:36 PM To: Amit Kapila Cc: PostgreSQL-development Subject: Re: [HACKERS] hint bit i/o reduction On Wed, Jun 13, 2012 at 9:02 AM, Amit Kapila wrote: >> Yes, but only in the unhinted case -- in oltp workloads tuples get >> hinted fairly quickly so I doubt this would be a huge impact. Hinted >> scans will work exactly as they do now. In the unhinted case for OLTP >> a few tests are added but that's noise compared to the other stuff >> going on. > > I believe the design you have purposed is for the unhinted cases only, means > if the tuple has already hint set then it will not go to new logic. Is that > right? > > In unhinted cases, there can be 2 ways hint bit can be set > 1. It gets the transaction status from Clog memory buffers > 2. It needs to do I/O to get the transaction status from Clog. > > So, I think it will add overhead in case-1 where the processing is fast, but > now we will add some new instructions to it. yeah -- but you still have to call: *) TransactionIdIsCurrentTransactionId(), which adds a couple of tests (and a bsearch in the presence of subtransactions) *) TransactionIdIsInProgress() which adds a few tests and a out of line call in the typical case *) TransactionIdDidCommit() which adds a few tests and two out of line calls in the typical case and, while setting the hint it: *) Several more tests plus: *) TransactionIdGetCommitLSN() another out of line call and a test *) XLogNeedsFlush() two more out of line calls plus a few tests *) SetBufferCommitInfoNeedsSave out of line call, several more tests adding a few instructions to the above probably won't be measurable (naturally it would be tested to confirm that). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers