Re: [HACKERS] Null row vs. row of nulls in plpgsql
Iirc the reason for this fuzziness came from the SQL spec definition of IS NULL for rows. As long as you maintain that level of spec- compliance I don't think there are any other important constraints on pg behaviour. greg --sorry for the top posting but the phone makes it hard to do anything else. On 27 Sep 2008, at 09:56 PM, Tom Lane [EMAIL PROTECTED] wrote: I looked a bit at the bug report here: http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php ISTM that the fundamental problem is that plpgsql doesn't distinguish properly between a null row value (eg, null::somerowtype) and a row of null values (eg, row(null,null,...)::somerowtype). When that code was designed, our main SQL engine was pretty fuzzy about the difference too, but now there is a clear semantic distinction. For plpgsql's RECORD variables this doesn't seem hard to fix: just take out the code in exec_move_row() that manufactures a row of nulls when the input is null, and maybe make a few small adjustments elsewhere. For ROW variables there's a bigger problem, because those are represented by a list of per-field variables, which doesn't immediately offer any way to represent overall nullness. I think it could be dealt with by adding an explicit the row as a whole is null flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm not sure if there are gotchas or unreasonably large code changes needed to make it happen. I thought for a little bit about whether we couldn't get rid of ROW variables entirely, or at least make them work more like RECORD variables by storing a HeapTuple instead of a list of per-field variables. But I soon found out that the reason to have them is to be able to describe the assignment target of SQL statements that assign to multiple scalar variables, eg SELECT ... INTO x,y,z. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Prototype: In-place upgrade v02
Hi, I have gone through the following stuff 1) previous emails on the patch 2) http://wiki.postgresql.org/wiki/In-place_upgrade 3) http://www.pgcon.org/2008/schedule/attachments/57_pg_upgrade_2008.pdf 4) http://wiki.postgresql.org/wiki/In-place_upgrade:Storage Here is what I have understood so far, (correct me if I am wrong) The on disk representation of data has changed from version to version over the years. For some strange reason (performance may be) the newer versions of pg were not backwards compatible, meaning that the newer version would not read data written by an older version if the on disk representation has changed in between. The end user would be required to port the data stored using older version to the newer version format using offline import export. This project aims upgrades from older to newer version on the fly. On-disk representation is not the only change that the system should accommodate, it should also accommodate catalog changes, conf file changes etc. Of the available design choices I think you have chosen to go with on-line data conversion, meaning that pg would now be aware of all the previous page layouts and based on a switch on page version would handle each page layout. This will only be done to read old data, newer data will be written in newer format. I am supposed to test the patch and for that I have downloaded pg versions 7.4, 8.0, 8.1, 8.2 and 8.3. I plan to create a data directory using each of the versions and then try to read the same using the 8.4 with your patch applied. What database objects should I create in the test database, should I just create objects of my choice? Does sizes (both length and breadth) of tables matter? Do I have to perform performance tests too? Regards Abbas On Fri, 2008-09-19 at 14:28 +0200, Zdenek Kotala wrote: thanks Abbas napsal(a): Even with that a hunk failed for bufpage.c, but I applied that part manually to move on. Regards Abbas On Thu, 2008-09-18 at 12:17 +0200, Zdenek Kotala wrote: Abbas napsal(a): Hi, I downloaded latest postgresql source code from git clone git://git.postgresql.org/git/postgresql.git and tried to apply the patch http://archives.postgresql.org/pgsql-hackers/2008-09/gza1fGXLvf3L.gz It does not apply cleanly, see the failures in attached file. It clash with hash index patch which was committed four days ago. Try to use little bit older revision from git (without hash index modification). Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planned maintenance downtime - tribble.postgresql.org
Stefan Kaltenbrunner wrote: The sysadmin team would like to announce a planned maintenance window for OS related updates on tribble.postgresql.org starting Sunday Sep 28 07:00 GMT (espected to last for an hour) affecting the following publically visible services: cvs.postgresql.org wwwmaster.postgresql.org www.pgadmin.org doxygen.postgresql.org wiki.postgresql.org I would ask people to hold off on any changes or commits to the affected services during that time period until you see an explicit it's done. all done and services should be up again - if you notice any problems please report back. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Null row vs. row of nulls in plpgsql
On Sun, 2008-09-28 at 04:03 +0300, Greg Stark wrote: Iirc the reason for this fuzziness came from the SQL spec definition of IS NULL for rows. As long as you maintain that level of spec- compliance I don't think there are any other important constraints on pg behaviour. What does SQL spec say about recursive IS NULL for rows ? Should we check that IS NULL is true for each row element, or must they actually be NULL's ? hannu=# select row(null, null) is NULL; ?column? -- t (1 row) hannu=# select row(null, row(null, null)) is NULL; ?column? -- f (1 row) -- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM rewrite: doc changes
Heikki Linnakangas [EMAIL PROTECTED] writes: To keep everyone who's interested up-to-date, attached is the latest patch. ... I find it a bit disturbing that a documentation patch actually removes more lines from the manual than adds, but it's quite understandable because it's no longer necessary to explain the two GUC options that used to be quite important :-). Comments welcome. Well, this patch isn't actually supposed to have user-visible impact other than eliminating a couple of troublesome configuration settings. So it's entirely expected for the docs to get shorter ;-) I did another pass of code-reading, and found a lot of nitpicks and some not-so-trivial issues. In no particular order: Copyright in indexfsm.c is a year off. InitIndexFreeSpaceMap should have a comment The comment for RecordFreeIndexPage gives the function's name incorrectly. InitFreeSpaceMap() should be explicitly declared as taking void in its definition. FreeSpaceMapTruncateRel seems to have a bug in its early-exit test: in the case where the number of FSM blocks stays the same, it fails to zero out slots in the last block. I also think it's got an off-by-one problem in figuring the number of FSM blocks: for the normal case where the new heap end is in the middle of a FSM block, shouldn't new_nfsmblocks be one larger than it is? The case where nblocks is an exact multiple of SlotsPerFSMPage would need to be special-cased to be exactly correct, though I see no real harm in letting the FSM be left one page too big in that case. The patch shouldn't be touching bufmgr.c at all any more --- or at least, none of the diffs there are improvements. Docs for contrib/pageinspect still need work: the 3-parameter form of get_raw_page isn't documented, nor the fork behavior of the 2-parameter form. In gistvacuum.c, you've removed the code that adjusts totFreePages to not count pages truncated away. I think you could just subtract the number of truncated pages from it, since they must have been counted in it earlier. (ginvacuum.c seems to get this right.) I do not like the kluge in heap_xlog_clean one bit, and think it's unnecessary anyway since we are not relying on the FSM to be accurate. Suggest reverting the heapam.c changes except for heap_sync(). rd_fsm_nblocks_cache should be reset in the places where rd_targblock is. You seem to have tracked the clearings of rd_smgr which is not the right thing at all. I see you renamed next, which is good, but the README isn't up to speed on it and a lot of the comments aren't either. Since fp_next_slot is signed, the sanity check in fsm_search_avail had better include target 0. The new search algorithm in fsm_search_avail still doesn't work. Consider what happens when the target is the rightmost slot on the page; it certainly won't wrap properly. fsm_truncate_avail seems quite broken: it's clearing the whole page always. In fsm_rebuild_page, surely we needn't check if (lchild NodesPerPage). Also you probably ought to make it if (fsmpage-fp_nodes[nodeno] != newvalue) { fsmpage-fp_nodes[nodeno] = newvalue; changed = true; } to avoid useless write traffic into a shared buffer. I think DEPTH should be a macro not a static int; it's certainly reducible to a compile-time constant. Also I wonder whether you really need the SlotsPerFSMPagePowers[] array at all (and if not, you could get rid of InitFreeSpaceMap). It's used in only one place and it seems a bit hard to argue that a multiplication loop really needs to be avoided there --- the division loop that comes after it will cost a lot more, and in any case both are negligible compared to the shared buffer fetch that's about to occur. This test in fsm_space_needed_to_cat: if (needed = (FSM_CATEGORIES - 1) * FSM_CAT_STEP) elog(ERROR, invalid FSM request size); reveals a rather fundamental problem: it is clearly possible for this test to fail on valid request sizes, because the page header overhead is less than FSM_CAT_STEP (especially if BLCKSZ is more than 8K). I'm not sure about a really clean solution here. We could offset the needed_to_cat and avail_to_cat calculations so that category 255 corresponds exactly to the maximum possible free space, but that requires assuming that FSM knows exactly what that is, which is a bit unpleasant. Thoughts? It seems a bit schizophrenic that fsm_search_avail takes a Buffer when all the other functions in fsmpage.c take Page arguments. I see why fsm_search_avail needs to do that, but maybe it'd be better if the other functions did too? fsm_search() should not take addr as an argument, since it has a built-in assumption that it is started at the root. I find the use of eof as both a local variable and a parameter in fsm_vacuum_page to be pretty poor programming practice. Maybe call the parameter eof_p? Shouldn't fsm_redo include a FreeFakeRelcacheEntry call?
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote: After reading this for awhile, I realized that there is a rather fundamental problem with it: it switches into consistent recovery mode as soon as it's read WAL beyond ControlFile-minRecoveryPoint. In a crash recovery situation that typically is before the last checkpoint (if indeed it's not still zero), and what that means is that this patch will activate the bgwriter and start letting in backends instantaneously after a crash, long before we can have any certainty that the DB state really is consistent. In a normal crash recovery situation this would be easily fixed by simply not letting it go to consistent recovery state at all, but what about recovery from a restartpoint? We don't want a slave that's crashed once to never let backends in again. But I don't see how to determine that we're far enough past the restartpoint to be consistent again. In crash recovery we assume (without proof ;-)) that we're consistent once we reach the end of valid-looking WAL, but that rule doesn't help for a slave that's following a continuing WAL sequence. Perhaps something could be done based on noting when we have to pull in a WAL segment from the recovery_command, but it sounds like a pretty fragile assumption. Seems like we just say we only signal the postmaster if InArchiveRecovery. Archive recovery from a restartpoint is still archive recovery, so this shouldn't be a problem in the way you mention. The presence of recovery.conf overrides all other cases. What that implements is my comment that we don't have to let anyone in at all during a plain crash recovery. It does nothing AFAICS for the problem that when restarting archive recovery from a restartpoint, it's not clear when it is safe to start letting in backends. You need to get past the highest LSN that has made it out to disk, and there is no good way to know what that is. Unless we can get past this problem the whole thing seems a bit dead in the water :-( * I'm a bit uncomfortable with the fact that the IsRecoveryProcessingMode flag is read and written with no lock. It's not a dynamic state, so I can fix that inside IsRecoveryProcessingMode() with a local state to make check faster. Erm, this code doesn't look like it can allow IsRecoveryProcessingMode to become locally true in the first place? I guess you could fix it by initializing IsRecoveryProcessingMode to true, but that seems likely to break other places. Maybe better is to have an additional local state variable showing whether the flag has ever been fetched from shared memory. The other issues don't seem worth arguing about ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl
Markus, * Markus Wanner ([EMAIL PROTECTED]) wrote: What does the subobject column for pg_shdepend buy us? Tracking column-level ACL dependencies rather than having those dependencies only be at the table-level. This complicates pg_shdepend some, but simplifies the dependency handling in the ACL area and in handling table/column drops. I'm still not a fan of having column-level deps handled differently between pg_shdepend and pg_depend, but that's not something which has to be addressed directly by the column-level privs patch. Perhaps once it's done I'll do a proof-of-concept for removing pg_attdef. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Ad-hoc table type?
I was in a discussion with someone about the difference between ad-hoc storage systems and SQL. Yes, I know, I was rolling my eyes as well. One thing did strike me though was the idea that a table could contain a variable number of columns. Something like this: create adhoc table foo (); insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); In an ad-hoc table type, when an insert is made, and a column is not found, then a new varchar column is added. I know the idea has a lot of holes, and is probably a bad idea, but it answers an important problem of easily mapping programmatic types to a database. Anyone think its interesting? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ad-hoc table type?
[EMAIL PROTECTED] writes: Something like this: create adhoc table foo (); insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); In an ad-hoc table type, when an insert is made, and a column is not found, then a new varchar column is added. I know the idea has a lot of holes, and is probably a bad idea, but it answers an important problem of easily mapping programmatic types to a database. Seems like a table with one contrib/hstore column might be more relevant to this guy's idea of how to do database design. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ad-hoc table type?
[EMAIL PROTECTED] writes: Something like this: create adhoc table foo (); insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); In an ad-hoc table type, when an insert is made, and a column is not found, then a new varchar column is added. I know the idea has a lot of holes, and is probably a bad idea, but it answers an important problem of easily mapping programmatic types to a database. Seems like a table with one contrib/hstore column might be more relevant to this guy's idea of how to do database design. That's actually a very cool module, I hadn't seen it before. I've considered writing something like it, but more XML centric, but I'm not sure it answers the concept. I'm not sure if you have dealt with web site sessions and object persistence crap, but its a pain to get up and running and improving performance is a drag. Web guys tend to know very little about databases and tend, sadly, not to be very inquisitive about such things. Web session and user attribute objects are typically stored in a database as XML, JSON, or some other aggregated format in a single column (hstore). That works great for when you just need to access the data by the key, but if you want to use the data outside the web application for something like OLAP, you have to decide which attributes reside in the aggregate column or get promoted to a full fledged column. That's why you'll see tables with username, passwdhash, email, etc. in addition to an aggregated column of things like screen template, age, etc. So, how do you have a table of a generally arbitrary number of columns without creating some sort of aggregate column? With an aggregate column, the data isn't on the same level as real column data, so you need to parse the aggregate to extract a value, and you have to do that for each value. On top of that, you then have to explain your aggregate strategy to the web guys. Being able to insert arbitrary named values, and extracting them similarly, IMHO works better and more naturally than some external aggregate system built on a column. I know it is a little outside the box thinking, what do you think? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Null row vs. row of nulls in plpgsql
Greg Stark [EMAIL PROTECTED] writes: On 27 Sep 2008, at 09:56 PM, Tom Lane [EMAIL PROTECTED] wrote: ISTM that the fundamental problem is that plpgsql doesn't distinguish properly between a null row value (eg, null::somerowtype) and a row of null values (eg, row(null,null,...)::somerowtype). When that code was designed, our main SQL engine was pretty fuzzy about the difference too, but now there is a clear semantic distinction. Iirc the reason for this fuzziness came from the SQL spec definition of IS NULL for rows. As long as you maintain that level of spec- compliance I don't think there are any other important constraints on pg behaviour. I started to poke into this and found out that it was a bit subtler than I thought. It'd be possible to associate a rowisnull state value with a row variable, but the problem is that plpgsql treats the row fields as independent variables that can be accessed without touching the row. In particular you can assign null or nonnull values to individual fields. So consider -- presumably, this'll set rowisnull to TRUE: rowvar := NULL; -- this had better cause rowisnull to become FALSE: rowvar.field1 := 42; -- does this cause it to become TRUE again? rowvar.field1 := NULL; There are a bunch of implementation problems with making any such behavior happen, since the row field variables don't currently know that they are members of a row, and indeed it's possible for the same variable to be a member of more than one row. But the core issue is that this interaction seems to fuzz the distinction between row is null and all the row's elements are null. In particular, if you think that rowisnull should be TRUE after the above sequence, then I think you are saying they are the same thing. So maybe the spec authors are smarter than we are. Thoughts? What would a consistent behavior look like? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ad-hoc table type?
[EMAIL PROTECTED] writes: Being able to insert arbitrary named values, and extracting them similarly, IMHO works better and more naturally than some external aggregate system built on a column. I know it is a little outside the box thinking, what do you think? I'm failing to see the point. Allowing columns to spring into existence without any forethought seems to me to be all minuses and no pluses worth mentioning. * What if the column name is just a typo? * What datatype should it have? (Always varchar is just lame.) * Should it have an index? If so, should it be unique? * If you keep doing this, you'll soon find yourself reading out unbelievably wide tables (lots of columns), which won't be especially easy or efficient to process on either the backend or the client side. Plus you might run into the max-columns-per-tuple limit. If you've expended enough thought to be sure that the column is not just a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN command to tell the database the results of your genius. I do see the point that switching from member of an hstore column to real database column is pretty painful, but I don't see that allow columns to spring into existence solves that in any meaningful way. Is there some other way we could address such conversions? BTW, I think it is (or should be) possible to create an index on hstore-'mycol', so at least one of the reasons why you should *need* to switch to a real database column seems bogus. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
On Sun, 2008-09-28 at 14:02 -0400, Tom Lane wrote: It does nothing AFAICS for the problem that when restarting archive recovery from a restartpoint, it's not clear when it is safe to start letting in backends. You need to get past the highest LSN that has made it out to disk, and there is no good way to know what that is. Unless we can get past this problem the whole thing seems a bit dead in the water :-( I agree the importance of your a problem but don't fully understand the circumstances under which you see a problem arising. AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in the controlfile, so whenever we restart we can see that it has been set previously and now we are beyond it. So if we crash during recovery and then restart *after* we reached minRecoveryLoc then we resume in safe mode almost immediately. If we crash during recovery before we reached minRecoveryLoc then we continue until we find it. There is a loophole, as described on separate post, but that can be plugged by offering explicit setting of the minRecoveryLoc from recovery.conf. Most people use pg_start_backup() so do not experience the need for that. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ad-hoc table type?
Not that I'm agreeing with the direction but just as a thinking experiment: Tom Lane wrote: [EMAIL PROTECTED] writes: Being able to insert arbitrary named values, and extracting them similarly, IMHO works better and more naturally than some external aggregate system built on a column. I know it is a little outside the box thinking, what do you think? I'm failing to see the point. Allowing columns to spring into existence without any forethought seems to me to be all minuses and no pluses worth mentioning. * What if the column name is just a typo? If it's a field in a data structure from a language such as Java, it's not a typo. * What datatype should it have? (Always varchar is just lame.) SQLite uses always varchar and it doesn't seem to be a problem. For simpler numbers like 0, the text form can be more compact, and the database may be portable across different hardware architectures. * Should it have an index? If so, should it be unique? It might be cool for indexes to automatically appear as they become beneficial (and removed as they become problematic). Unique is a constraint which should be considered separate from whether it should be an index or not. I don't know if it would be useful or not. * If you keep doing this, you'll soon find yourself reading out unbelievably wide tables (lots of columns), which won't be especially easy or efficient to process on either the backend or the client side. Plus you might run into the max-columns-per-tuple limit. Introduce variable field-order for tuples? Only provide values if non-null? :-) If you've expended enough thought to be sure that the column is not just a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN command to tell the database the results of your genius. I do see the point that switching from member of an hstore column to real database column is pretty painful, but I don't see that allow columns to spring into existence solves that in any meaningful way. Is there some other way we could address such conversions? BTW, I think it is (or should be) possible to create an index on hstore-'mycol', so at least one of the reasons why you should *need* to switch to a real database column seems bogus. I find the Oracle nested table and data structure support enticing although I do not have experience with it. It seems like it might be a more mature implementation of hstore? If hstore had everything that was required in terms of performance or flexibility, we wouldn't need fixed columns at all? But yes - I tend to agree that the object persistent layer can be hidden away behind something like the Java object persistence model, automatically doing alter table or providing a configured mapping from a description file. This isn't a problem that needs to be solved at the database layer. Cheers, mark -- Mark Mielke [EMAIL PROTECTED]
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: It does nothing AFAICS for the problem that when restarting archive recovery from a restartpoint, it's not clear when it is safe to start letting in backends. You need to get past the highest LSN that has made it out to disk, and there is no good way to know what that is. AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in the controlfile, so whenever we restart we can see that it has been set previously and now we are beyond it. Right ... So if we crash during recovery and then restart *after* we reached minRecoveryLoc then we resume in safe mode almost immediately. Wrong. What minRecoveryLoc is is an upper bound for the LSNs that might be on-disk in the filesystem backup that an archive recovery starts from. (Defined as such, it never changes during a restartpoint crash/restart.) Once you pass that, the on-disk state as modified by any dirty buffers inside the recovery process represents a consistent database state. However, the on-disk state alone is not guaranteed consistent. As you flush some (not all) of your shared buffers you enter other not-certainly-consistent on-disk states. If we crash in such a state, we know how to use the last restartpoint plus WAL replay to recover to another state in which disk + dirty buffers are consistent. However, we reach such a state only when we have read WAL to beyond the highest LSN that has reached disk --- and in recovery mode there is no clean way to determine what that was. Perhaps a solution is to make XLogFLush not be a no-op in recovery mode, but have it scribble a highest-LSN somewhere on stable storage (maybe scribble on pg_control itself, or maybe better someplace else). I'm not totally sure about that. But I am sure that doing nothing will be unreliable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ad-hoc table type?
[EMAIL PROTECTED] writes: Being able to insert arbitrary named values, and extracting them similarly, IMHO works better and more naturally than some external aggregate system built on a column. I know it is a little outside the box thinking, what do you think? I'm failing to see the point. Allowing columns to spring into existence without any forethought seems to me to be all minuses and no pluses worth mentioning. * What if the column name is just a typo? In an automated system like PHP, Java, etc. that's not too likely. * What datatype should it have? (Always varchar is just lame.) varchar or text is not just lame, SQLite used to do that exclusively. One could argue that XML is nothing more than text. * Should it have an index? If so, should it be unique? The answer to that is, well, no, not unless the dba generates one or it is declared. Just like any other column. All the rules that apply to create table and alter table add column just apply naturally as would be expected. create adhoc table userdata(username varchar, email varchar, primary key(email)); * If you keep doing this, you'll soon find yourself reading out unbelievably wide tables (lots of columns), which won't be especially easy or efficient to process on either the backend or the client side. Plus you might run into the max-columns-per-tuple limit. Well, I fully understand that it is not a general purpose unlimited width sort of thing. In a programing environment, the target environment for this type of feature, it is unlikely to be a run-away problem. If you've expended enough thought to be sure that the column is not just a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN command to tell the database the results of your genius. Like I said, if you've never dealt with a live web site, maintained by a team of web dudes, working furiously to keep their job and get paid, your only hope to keep up with Oh! I needed to add the 'time to live' of the session into the session data is to use an aggregate storage system. I do see the point that switching from member of an hstore column to real database column is pretty painful, but I don't see that allow columns to spring into existence solves that in any meaningful way. Is there some other way we could address such conversions? Every other solution creates a second tier of data storage. You either deal with data elements at the table level, or you create a roll your own aggregate mechanism, or make a HUGE table of user,name,value table and force a join and index scan for every select. (A million users, 5-10 attributes each is an expensive join.) BTW, I think it is (or should be) possible to create an index on hstore-'mycol', so at least one of the reasons why you should *need* to switch to a real database column seems bogus. Oh, yea, function indexes work great. I think you did that right? For what its worth, I don't expect you to jump all over this. It really is a divergence from classic SQL design. I'm not even sure I like it. In fact, I don't like it, but the argument that you are being forced to create a second class data storage mechanism or a relational join for data that is logically in a single relation does cause one to ponder the problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Operation needed for datfrozenxid bug?
Hello, The datfrozenxid bug is fixed in 8.3.4, but do I need additional operations after upgrade postgres binaries? | Fix potential miscalculation of datfrozenxid (Alvaro) | This error may explain some recent reports of failure | to remove old pg_clog data. I assume that pg_database.datfrozenxid is wrongly set to 3 (FirstNormalTransactionId) if the bug has occurred. 1. Can I use the following SQL to check whether the bug has occurred? SELECT * FROM pg_database WHERE datfrozenxid = 3 (will return some rows under the bug.) 2. What is needed to fix the corrupted system catalog, something like the following SQL? UPDATE pg_database SET datfrozenxid = relfrozenxid FROM (SELECT relfrozenxid FROM pg_class WHERE relkind IN ('r', 't') ORDER BY age(relfrozenxid) DESC LIMIT 1) AS tmp WHERE datname = current_database(); Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel pg_restore - WIP patch
Andrew Dunstan wrote: this works better but there is something fishy still - using the same dump file I get a proper restore using pg_restore normally. If I however use -m for a parallel one I only get parts (in this case only 243 of the 709 tables) of the database restored ... Yes, there are several funny things going on, including some stuff with dependencies. I'll have a new patch tomorrow with luck. Thanks for testing. OK, in this version a whole heap of bugs are fixed, mainly those to do with dependencies and saved state. I get identical row counts in the source and destination now, quite reliably. cheers andrew Index: pg_backup.h === RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup.h,v retrieving revision 1.47 diff -c -r1.47 pg_backup.h *** pg_backup.h 13 Apr 2008 03:49:21 - 1.47 --- pg_backup.h 29 Sep 2008 02:43:51 - *** *** 123,128 --- 123,130 int suppressDumpWarnings; /* Suppress output of WARNING entries * to stderr */ bool single_txn; + int number_of_threads; + booltruncate_before_load; bool *idWanted; /* array showing which dump IDs to emit */ } RestoreOptions; *** *** 165,170 --- 167,173 extern void CloseArchive(Archive *AH); extern void RestoreArchive(Archive *AH, RestoreOptions *ropt); + extern void RestoreArchiveParallel(Archive *AH, RestoreOptions *ropt); /* Open an existing archive */ extern Archive *OpenArchive(const char *FileSpec, const ArchiveFormat fmt); Index: pg_backup_archiver.c === RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v retrieving revision 1.158 diff -c -r1.158 pg_backup_archiver.c *** pg_backup_archiver.c 5 Sep 2008 23:53:42 - 1.158 --- pg_backup_archiver.c 29 Sep 2008 02:43:52 - *** *** 27,38 --- 27,50 #include unistd.h + #include sys/types.h + #include sys/wait.h + + #ifdef WIN32 #include io.h #endif #include libpq/libpq-fs.h + typedef struct _parallel_slot + { + pid_t pid; + TocEntry *te; + DumpId dumpId; + } ParallelSlot; + + #define NO_SLOT (-1) const char *progname; *** *** 70,76 --- 82,99 static void dumpTimestamp(ArchiveHandle *AH, const char *msg, time_t tim); static OutputContext SetOutput(ArchiveHandle *AH, char *filename, int compression); static void ResetOutput(ArchiveHandle *AH, OutputContext savedContext); + static bool work_is_being_done(ParallelSlot *slot, int n_slots); + static int get_next_slot(ParallelSlot *slots, int n_slots); + static TocEntry *get_next_work_item(ArchiveHandle *AH); + static void prestore(ArchiveHandle *AH, TocEntry *te); + static void mark_work_done(ArchiveHandle *AH, pid_t worker, ParallelSlot *slots, int n_slots); + static int _restore_one_te(ArchiveHandle *ah, TocEntry *te, RestoreOptions *ropt,bool is_parallel); + static void _reduce_dependencies(ArchiveHandle * AH, TocEntry *te); + static void _fix_dependency_counts(ArchiveHandle *AH); + static void _inhibit_data_for_failed_table(ArchiveHandle *AH, TocEntry * te); + + static ArchiveHandle *GAH; /* * Wrapper functions. *** *** 125,137 /* Public */ void RestoreArchive(Archive *AHX, RestoreOptions *ropt) { ArchiveHandle *AH = (ArchiveHandle *) AHX; TocEntry *te; teReqs reqs; OutputContext sav; - bool defnDumped; AH-ropt = ropt; AH-stage = STAGE_INITIALIZING; --- 148,579 /* Public */ void + RestoreArchiveParallel(Archive *AHX, RestoreOptions *ropt) + { + + ArchiveHandle *AH = (ArchiveHandle *) AHX; + ParallelSlot *slots; + int next_slot; + TocEntry *next_work_item = NULL; + int work_status; + pid_t ret_child; + int n_slots = ropt-number_of_threads; + TocEntry *te; + teReqsreqs; + + + /* AH-debugLevel = 99; */ + /* some routines that use ahlog() don't get passed AH */ + GAH = AH; + + ahlog(AH,1,entering RestoreARchiveParallel\n); + + + slots = (ParallelSlot *) calloc(sizeof(ParallelSlot),n_slots); + AH-ropt = ropt; + + /* + if (ropt-create) + die_horribly(AH,modulename, + parallel restore is incompatible with --create\n); + */ + + + if (ropt-dropSchema) + die_horribly(AH,modulename, + parallel restore is incompatible with --clean\n); + + if (!ropt-useDB) + die_horribly(AH,modulename, + parallel restore requires direct database connection\n); + + + #ifndef HAVE_LIBZ + + /* make sure we won't need (de)compression we haven't got */ + if (AH-compression != 0 AH-PrintTocDataPtr != NULL) + { + for (te = AH-toc-next; te != AH-toc; te = te-next) + { + reqs = _tocEntryRequired(te, ropt, false); + if (te-hadDumper (reqs REQ_DATA) != 0) + die_horribly(AH, modulename, + cannot restore from compressed archive (compression not
Re: [HACKERS] Ad-hoc table type?
On Sep 28, 2008, at 17:46, Tom Lane wrote: BTW, I think it is (or should be) possible to create an index on hstore-'mycol', so at least one of the reasons why you should *need* to switch to a real database column seems bogus. The docs say: titleIndexes/title para typehstore/ has index support for literal@gt;/ and literal?/ operators. You can use either GiST or GIN index types. For example: /para programlisting CREATE INDEX hidx ON testhstore USING GIST(h); CREATE INDEX hidx ON testhstore USING GIN(h); /programlisting I'm not sure what that means. Can you create normal btree or hash indexes on hstore columns? And is the index useful for both `@` and `? `? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ad-hoc table type?
What you're talking about is a document based database like StrokeDB, CouchDB. With hstore you don't need to parse content of 'aggregate' column, it provides necessary methods. Also, we tried to speedup selects using indexes. Probably, we need to refresh our interest to hstore, do you have any actual proposals ? Oleg On Sun, 28 Sep 2008, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] writes: Something like this: create adhoc table foo (); insert into foo (name, rank, serial) values ('joe', 'sargent', '42'); In an ad-hoc table type, when an insert is made, and a column is not found, then a new varchar column is added. I know the idea has a lot of holes, and is probably a bad idea, but it answers an important problem of easily mapping programmatic types to a database. Seems like a table with one contrib/hstore column might be more relevant to this guy's idea of how to do database design. That's actually a very cool module, I hadn't seen it before. I've considered writing something like it, but more XML centric, but I'm not sure it answers the concept. I'm not sure if you have dealt with web site sessions and object persistence crap, but its a pain to get up and running and improving performance is a drag. Web guys tend to know very little about databases and tend, sadly, not to be very inquisitive about such things. Web session and user attribute objects are typically stored in a database as XML, JSON, or some other aggregated format in a single column (hstore). That works great for when you just need to access the data by the key, but if you want to use the data outside the web application for something like OLAP, you have to decide which attributes reside in the aggregate column or get promoted to a full fledged column. That's why you'll see tables with username, passwdhash, email, etc. in addition to an aggregated column of things like screen template, age, etc. So, how do you have a table of a generally arbitrary number of columns without creating some sort of aggregate column? With an aggregate column, the data isn't on the same level as real column data, so you need to parse the aggregate to extract a value, and you have to do that for each value. On top of that, you then have to explain your aggregate strategy to the web guys. Being able to insert arbitrary named values, and extracting them similarly, IMHO works better and more naturally than some external aggregate system built on a column. I know it is a little outside the box thinking, what do you think? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers