Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration
On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote: On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote: Having the supporting code in core does not make much of a difference otherwise from having it in contrib, does it? Given the nonextensibility of gram.y and keywords.c, it has to be in core to even think about having special syntax :-( Has anyone ever heard of extensible grammers? (not specifically answering Martijn...) The main thought for me on this thread is: Why do we need to invent *any* grammar to make this work? Why not just use functions? For PITR we have pg_start_backup() rather than BACKUP START. For advisory locks we have pg_advisory_lock() What's wrong with having pg_tsearch_ functions to do everything? There's nothing wrong with additional catalog tables/columns that are manipulated by function calls only. We have that already - look at pg_stat_reset() - no grammar stuff there. Anybody with an Oracle or SQLServer background is used to seeing system functions available as function calls; as I've observed above, so are we. We should keep the grammar clean to allow a very close adherence to SQL standards, IMHO. I would like to see Oleg and Teodor's good work come into core, but I don't want to see bucketfuls of new grammar issues. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration
On Sat, 18 Nov 2006, Simon Riggs wrote: On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote: On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote: Having the supporting code in core does not make much of a difference otherwise from having it in contrib, does it? Given the nonextensibility of gram.y and keywords.c, it has to be in core to even think about having special syntax :-( Has anyone ever heard of extensible grammers? (not specifically answering Martijn...) The main thought for me on this thread is: Why do we need to invent *any* grammar to make this work? Why not just use functions? For PITR we have pg_start_backup() rather than BACKUP START. For advisory locks we have pg_advisory_lock() What's wrong with having pg_tsearch_ functions to do everything? There's nothing wrong with additional catalog tables/columns that are manipulated by function calls only. We have that already - look at pg_stat_reset() - no grammar stuff there. Anybody with an Oracle or SQLServer background is used to seeing system functions available as function calls; as I've observed above, so are we. We should keep the grammar clean to allow a very close adherence to SQL standards, IMHO. I would like to see Oleg and Teodor's good work come into core, but I don't want to see bucketfuls of new grammar issues. Summarizing, we have two questions - 1. Will tsearch comes to the core 2. Do we need grammar changes I hope, we have consensus about 1. - we need fts as a core feature. Second question is not very principal, that's why we asked -hackers. So, if we'll not touch grammar, are there any issues with tsearch2 in core ? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration
Oleg Bartunov wrote: So, if we'll not touch grammar, are there any issues with tsearch2 in core ? Are there any issues with tsearch2 not in core? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Fwd: [HACKERS] Remove contrib version of rtree_gist --- now in core system ?
-- Forwarded message -- From: jorge alberto [EMAIL PROTECTED] Date: Nov 17, 2006 2:09 PM Subject: Re: [HACKERS] Remove contrib version of rtree_gist --- now in core system ? To: Tom Lane [EMAIL PROTECTED] hi ! thanks for writing! In postgresql version 8.0.9 the rtree_gist implementation is located at contrib/rtree_gist ready for compilation and then ready for loading it into a database, I want to know where can I find this rtree_gist implementation in versions 8.1.x regards jorge On 11/14/06, Tom Lane [EMAIL PROTECTED] wrote: jorge alberto [EMAIL PROTECTED] writes: I have read that since version 8.1 the contrib version of rtree_gist is not in /contrib anymore, now is in core system, but where can I find it? You don't need to find it, it's built in. Just create an index. regards, tom lane
Re: [HACKERS] Custom Data Type Question
On Wed, 2006-11-15 at 16:38 -0500, Andrew Dunstan wrote: My little enumkit tool allows you to create enumerations today very easily, but its values are completely hardcoded. However, the above trick still works. The downside is that each enumeration type requires a tiny bit of compilation. Andrew, Your enum sounds good, apart from the hardcoded/compilation thing. That is a data management nightmare AFAICS and so restricts the usefulness of the solution. It would be much better to read things dynamically into an array, so using an init function in *preload_libraries would work well. I'd also love any suggestions as to how we might be able to use a similar local-data-cacheing mechanism to work when we specify SQL standard FOREIGN KEYs. It would be really cool to say USING LOCAL CACHE or some way of avoiding the overhead of all those stored after triggers and SPI SELECT statements when we've got checks against tables with only a few rows where the values hardly ever change. The enum concept departs radically from the declarative Referential Integrity concepts that many of us are already used to. I'd like to be able to speed things up without radical re-design of the database... so a few nicely sprinked ALTER TABLE statements would be a much better way of implementing this IMHO. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3
On Fri, Nov 17, 2006 at 11:40:36PM -0500, Tom Lane wrote: Stephen Harris [EMAIL PROTECTED] writes: Why not, after calling fork() create a new process group with setsid() and then instead of killing the recovery thread, kill the whole process group (-PID rather than PID)? Then every process (the recovery thread, the system, the script, any child of the script) will all receive the signal. This seems like a good answer if setsid and/or setpgrp are universally available. I fear it won't work on Windows though :-(. Also, each It's POSIX, so I would suppose it's standard on most modern *nix platforms. Windows... bluh. I wonder how perl handles POSIX::setsid() on Windows! backend would become its own process group leader --- does anyone know if adding hundreds of process groups would slow down any popular kernels? Shouldn't hurt. This is, after all, what using in a command line shell with job control (csh, ksh, tcsh, bash, zsh) does. Because you only run one archive or recovery thread at a time (which is very good and very clever) you won't have too many process groups at any instance in time. [ thinks for a bit... ] Another issue is that there'd be a race condition during backend start: if the postmaster tries to kill -PID before the backend has managed to execute setsid, it wouldn't work. *ponder* Bugger. Standard solutions (eg try three times with a second pause) would mitigate this, but Hmm. Another idea is to make the shutdown be more co-operative under control of the script; eg an exit code of 0 means xlog is now available, code if 1 means the log is non-existent (so recovery is complete) and an exit code of 255 means failure to recover; perform database shutdown. In this way a solution similar to the existing trigger files (recovery complete) could be used. It's a little messy in that pg_ctl wouldn't be used to shutdown the database; the script would essentially tell the recovery thread to abort, which would tell the main postmaster to shutdown. We'd have no clients connected, no child process running, so a smart shutdown would work. -- rgds Stephen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work
Can't keywords share code the way to do what you want I think is like this: foo: bar_or_baz { code block } ; bar_or_baz: bar | baz ; I'll try that, thanks. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work
I suggest you to contribute this kind of code to orafce project [1] Thanks, I'll go play over there for a while. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work
I found it interesting that gram.c and parse.h already supported SYSDATE. Only after you ran bison ;-). They're derived files. Well, so much for my conspiracy theory. Thanks for the bison lesson. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Custom Data Type Question
Simon Riggs wrote: On Wed, 2006-11-15 at 16:38 -0500, Andrew Dunstan wrote: My little enumkit tool allows you to create enumerations today very easily, but its values are completely hardcoded. However, the above trick still works. The downside is that each enumeration type requires a tiny bit of compilation. Andrew, Your enum sounds good, apart from the hardcoded/compilation thing. That is a data management nightmare AFAICS and so restricts the usefulness of the solution. Simon, Tom Dunstan has submitted a patch for first class enum types that do not have the compilation requirement - it's in the queue for 8.3. You might want to review that. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration
Hi, Peter Eisentraut wrote: Are there any issues with tsearch2 not in core? I have run into troubles when restoring a dump, especially across different versions of PostgreSQL and tsearch2. Mainly because pg_ts_* are not system tables and thus need to be restored or installed separately. And there still is the packaging issue which needs to be addressed. It's not complicated, but a PITA to compile stemmers and setup custom dictionaries. What's really needed IMO is a clever packaging, including stemmers and dictionaries for as many languages as we can come up with. So on a debian system, it should become as simple as: apt-get install postgresql-contrib-8.3 apt-get install postgresql-language-pack-english-8.3 apt-get install postgresql-language-pack-german-8.3 apt-get install postgresql-language-russian-german-8.3 Inclusion into core surely does not help with that. Relabeling contrib to modules or extras or something would probably give some people a warm fuzzy feeling when installing. OTOH, these are probably the very same people who get excited about tsearch2 in core, so if we want to satisfy them, we better put it right into core... I dunno. Regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work
Why should we add this Oraclism to PostgreSQL? I doesn't add any new feature. Certainly, this feature falls well within the class of completely gratuitous proprietary extensions that we typically reject. I now agree completely. My purpose is to migrate Oracle databases to Posgres, and I had thought that Oracle didn't support CURRENT_DATE, CURRENT_TIMESTAMP, and so on. However, I've just learned otherwise. So, I think the proper migration process for a production database would be to first change the Oracle DB to use CURRENT_DATE (or some other standard psuedo column), since that will work properly under both Oracle and Postgres. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: syntax of operation with tsearch's configur
--- Original Message --- From: Peter Eisentraut [EMAIL PROTECTED] To: Jeremy Drake [EMAIL PROTECTED] Sent: 18/11/06, 07:30:48 Subject: Re: [HACKERS] Proposal: syntax of operation with tsearch's configuration Jeremy Drake wrote: I am currently in the position that my hosting provider is apprehensive about installing modules in contrib because they believe they are less secure. Using irrational and unfounded statements one can of course make arguments for just about anything, but that won't help us. Unfounded? pg_file_write() is in contrib, not core precisely because some considered it to be a possible risk (despite COPY being as dangerous) and wanted to make it a non-default option. Personally I think a guc to disable it, COPY and other potentially dangerous utility functions would be preferrable but that's a whole other argument. Regards, Dave ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Proposal: syntax of operation with
Peter Eisentraut wrote: Oleg Bartunov wrote: So, if we'll not touch grammar, are there any issues with tsearch2 in core ? Are there any issues with tsearch2 not in core? Quite apart from anything else, it really needs documentation of the standard we give other core features. I think if a feature will be of sufficiently general use it should be a candidate for inclusion, and text search certainly comes within that category in my mind. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration
On Sat, 18 Nov 2006, Andrew Dunstan wrote: Peter Eisentraut wrote: Oleg Bartunov wrote: So, if we'll not touch grammar, are there any issues with tsearch2 in core ? Are there any issues with tsearch2 not in core? Quite apart from anything else, it really needs documentation of the standard we give other core features. Sure. I just learned how to built (successfully) pg documentation and researching on what's documentation standard. Should we need to write separate full text search chapter and/or add description to relevant chapters. I think if a feature will be of sufficiently general use it should be a candidate for inclusion, and text search certainly comes within that category in my mind. It could helps us in Pg-MySQL discussions, at least, since we beat mysql's fts :) Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Ontology on PostgreSQL - is there something?
Dear PG-hackers, Based on the paper below [1], I ask: is there anyone working on, or already tried to such native implementation on PostgreSQL? I didn't find anything related on pgFoundry. There is also a presentation [2] related to the paper. By Souripriya Das, Eugene Inseok Chong, George Eadon, Jagannathan Srinivasan, Proceedings of the 30th VLDB Conference, Toronto, Canada. Ontologies are increasingly being used to build applications that utilize domain-specific knowledge. This paper addresses the problem of supporting ontology-based semantic matching in RDBMS. The approach enables users to reference ontology data directly from SQL using the semantic match operators, thereby opening up possibilities of combining with other operations such as joins as well as making the ontology-driven applications easy to develop and efficient. In contrast, other approaches use RDBMS only for storage of ontologies and querying of ontology data is typically done via APIs. This paper presents the ontology-related functionality including inferencing, discusses how it is implemented on top of Oracle RDBMS, and illustrates the usage with several database applications. [1] http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF [2] http://www-scf.usc.edu/~csci586/ppt-2005/dguo.pps Besides, what are your opinions on the subject? Is it worthful to have such native implementation on PG, even as a modest contrib module? Based on your development know-how on the backend internals, how difficult is it to develop this (3 operators + 1 index type)? -- Best regards, Rodrigo Hjort http://icewall.org/~hjort CELEPAR - Cia de Informática do Paraná - Brasil http://www.pr.gov.br
Re: [HACKERS] Proposal: syntax of operation with
Andrew Dunstan wrote: Peter Eisentraut wrote: Oleg Bartunov wrote: So, if we'll not touch grammar, are there any issues with tsearch2 in core ? Are there any issues with tsearch2 not in core? Quite apart from anything else, it really needs documentation of the standard we give other core features. I think if a feature will be of sufficiently general use it should be a candidate for inclusion, and text search certainly comes within that category in my mind. I agree here - full text search is of general use (and a very often requested) feature - including it in core will both help us in marketing postgresql (which should notbe seen as a bad thing at all) and more to the point it provides an in-core user and showcase for two very powerful and innovative technologies - GIST and GIN. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Proposal: syntax of operation with tsearch's configuration
Alvaro Herrera wrote: Oleg Bartunov wrote: On Fri, 17 Nov 2006, Andrew Dunstan wrote: I am also a bit concerned that the names of the proposed objects (parser, dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and TS_PARSER might be better if we in fact need to name them. this looks reasonable to me. Huh, but we don't use keywords with ugly abbreviations and underscores. How about FULLTEXT DICTIONARY and FULLTEXT PARSER? (Using FULLTEXT instead of FULL TEXT means you don't created common reserved words, and furthermore you don't collide with an existing type name.) sounds fine I also think the thousands of lines is an exaggeration :-) The grammar should take a couple dozen at most. The rest of the code would go to their own files. We should also take the opportunity to discuss new keywords for the XML support -- will we use new grammar, or functions? that is a good question and we should decide on a direction for that - we already have a feature in shipping code that causes quite some confusion in that regard(autovacuum). What see I from supporting/consulting people is that there are more and more people adapting autovacuum for there databases but those with complex ones want to override them on a per table base. We already provide a rather crude interface for that - namely manually inserting some rows into a system table which is confusing the heck out of people (they are either responding with why is there now ALTER AUTOVACUUM SET ... or and equivalent pg_* function for that). I'm not too sure what the most suitable interface for that would be but finding a consistent solution for that might be good nevertheless. Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [Fwd: Index Advisor]
Hi Gurjeet, I will look at the pg_advise bug and will send a patch ASAP. Best, Kai Am 15.11.2006 um 15:34 schrieb Gurjeet Singh: BUGS: = .) The SELECTs in the pg_advise are returning wrong results, when the same index is suggested twice, because of the SUM() aggregates. .) I doubt that on a table t(a,b), for a suggestion of idx(b,a), pg_advise will suggest idx(a,b); Wish-list: == .) Make pg_indexadvisor a user table. Reason: a normal user cannot do delete from pg_indexadvisor. Difficulty: Need to know how to do insert into pg_indexadvisor values( 1, ...) from within the backend; that is, need to study/ invent RSI (Recursive SQL Interface). Trial code can be seen by searching for: exec_simple_query( insert into index_advisor values ( 10 ), advisor /*portal name*/ ); .) Make it plugin-based. Reason: so that someone else with a better idea can replace this advisor, without having to recompile the server. Difficulty: This code calls many internal functoions: index_create(), index_drop(), planner(), etc. That makes it impossible to compile it standalone. .) Remove the dependency on the global index_candidates; used for communication between indexadvisor.c and plancat.c. Reason: Bad coding practice. Difficulty: Even though I was successful in updating pg_class.relpages for the virtual indexes, the planner is still calling smgr.c code to get the number of pages occupied by the index! Hence, I had to use the global the way I did. Best regards, -- [EMAIL PROTECTED] singh.gurjeet @{ gmail | hotmail | yahoo }.com patch_and_other_files.tar.gz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Ontology on PostgreSQL - is there something?
Rodrigo, Besides, what are your opinions on the subject? That I don't understand what they're talking about. What's Ontology in a database sense? Can you give some examples? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work
Matt, I now agree completely. My purpose is to migrate Oracle databases to Posgres, and I had thought that Oracle didn't support CURRENT_DATE, CURRENT_TIMESTAMP, and so on. However, I've just learned otherwise. So, I think the proper migration process for a production database would be to first change the Oracle DB to use CURRENT_DATE (or some other standard psuedo column), since that will work properly under both Oracle and Postgres. Yep, or use the Orafce project.We're happy to support compatibility syntax in completely separate add-in projects. Just not in the core code. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Nasty VACUUM/bgwriter/segmentation bug
While working on fixing the recently reported hash-index problem, I was using a test build with a very small RELSEG_SIZE (128K), so that I could trigger the reported bug with a reasonably small amount of data. And I started finding some unexpected data corruption. I eventually reduced it to this test case: checkpoint; create table foo (f1 int); insert into foo select x from generate_series(1,10) x; -- wait 30 seconds delete from foo; vacuum verbose foo; insert into foo select x from generate_series(1,10) x; \q stop and restart postmaster, then vacuum verbose foo; This vacuum will generate a whole lot of WARNING: relation foo page 16 is uninitialized --- fixing WARNING: relation foo page 17 is uninitialized --- fixing ... and when the dust settles, most of the second batch of 10 rows is gone. What is happening is that during that 30-second wait, the bgwriter is dumping out all the dirty pages, and acquiring open file references to each segment of table foo as it does so. The VACUUM then truncates foo back to zero size, since it contains no data after the DELETE, and then the second INSERT bulks it up again. The problem is that the bgwriter still has open file references to the deleted segments after the first one, and that's where it'll write the data if given a chance. So the updates disappear into the ether as far as any other process is concerned, for each segment except the first. There's a rather indirect mechanism that's supposed to prevent similar problems between two backends: a file truncation is supposed to be associated with a forced relcache flush, and that causes an smgrclose(), so other backends will be forced to reopen the file(s) before they can do any more work with the truncated relation. On reflection I think I don't trust this though, because in the case where a backend writes a dirty buffer because it needs to reclaim a buffer, it doesn't try to open the relation at the relcache level (it cannot, because the rel might be in a different database). So it could use a stale smgr relation, same as the bgwriter. The bgwriter does not participate in shared cache inval, not having a relcache in the first place, and so this mechanism doesn't help it anyway. This is a fairly low-probability bug in real-world cases, because it could only happen when a relation is truncated and then re-expanded across a 1GB segment boundary. Moreover, because the bgwriter flushes all its open files after each checkpoint, the window for trouble only extends to the next checkpoint. But it definitely could happen, and it might explain some irreproducible corruption reports. I think that the easiest fix might be to not remove no-longer-used segment files during a truncate, but simply reduce them to zero size rather than delete them. Then any open file pointers aren't invalidated. The only alternative I can see is to invent some new signaling mechanism to force closure of open files, but that seems ugly, complex, and perhaps subject to race conditions. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Ontology on PostgreSQL - is there something?
Josh Berkus wrote: That I don't understand what they're talking about. What's Ontology in a database sense? Can you give some examples? You'll need to RTFP , which in all fairness the OP did cite in his posting : http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF
Re: [HACKERS] Nasty VACUUM/bgwriter/segmentation bug
Tom Lane wrote: While working on fixing the recently reported hash-index problem, I was using a test build with a very small RELSEG_SIZE (128K), so that I could trigger the reported bug with a reasonably small amount of data. And I started finding some unexpected data corruption. I eventually reduced it to this test case: checkpoint; create table foo (f1 int); insert into foo select x from generate_series(1,10) x; -- wait 30 seconds delete from foo; vacuum verbose foo; insert into foo select x from generate_series(1,10) x; \q stop and restart postmaster, then vacuum verbose foo; This vacuum will generate a whole lot of WARNING: relation foo page 16 is uninitialized --- fixing WARNING: relation foo page 17 is uninitialized --- fixing ... and when the dust settles, most of the second batch of 10 rows is gone. What is happening is that during that 30-second wait, the bgwriter is dumping out all the dirty pages, and acquiring open file references to each segment of table foo as it does so. The VACUUM then truncates foo back to zero size, since it contains no data after the DELETE, and then the second INSERT bulks it up again. The problem is that the bgwriter still has open file references to the deleted segments after the first one, and that's where it'll write the data if given a chance. So the updates disappear into the ether as far as any other process is concerned, for each segment except the first. Does TRUNCATE suffer from the same issue? There's a rather indirect mechanism that's supposed to prevent similar problems between two backends: a file truncation is supposed to be associated with a forced relcache flush, and that causes an smgrclose(), so other backends will be forced to reopen the file(s) before they can do any more work with the truncated relation. On reflection I think I don't trust this though, because in the case where a backend writes a dirty buffer because it needs to reclaim a buffer, it doesn't try to open the relation at the relcache level (it cannot, because the rel might be in a different database). So it could use a stale smgr relation, same as the bgwriter. The bgwriter does not participate in shared cache inval, not having a relcache in the first place, and so this mechanism doesn't help it anyway. This is a fairly low-probability bug in real-world cases, because it could only happen when a relation is truncated and then re-expanded across a 1GB segment boundary. Moreover, because the bgwriter flushes all its open files after each checkpoint, the window for trouble only extends to the next checkpoint. But it definitely could happen, and it might explain some irreproducible corruption reports. Regular imports that delete data or truncate relations would increase this probability wouldn't they? Autovac is also likely to run on that relation in the wait phase, which other relations are being truncated by an import process. I think that the easiest fix might be to not remove no-longer-used segment files during a truncate, but simply reduce them to zero size rather than delete them. Then any open file pointers aren't invalidated. The only alternative I can see is to invent some new signaling mechanism to force closure of open files, but that seems ugly, complex, and perhaps subject to race conditions. Thoughts? Seems reasonable from my lowly user point of view. Would there be a requirement to remove the extra segments at any point in the future or would they hang around on the disk forever? Russell Smith regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Ontology on PostgreSQL - is there something?
On 11/18/06, Rodrigo Hjort [EMAIL PROTECTED] wrote: Dear PG-hackers, Based on the paper below [1], I ask: is there anyone working on, or already tried to such native implementation on PostgreSQL? I didn't find anything related on pgFoundry. There is also a presentation [2] related to the paper. By Souripriya Das, Eugene Inseok Chong, George Eadon, Jagannathan Srinivasan, Proceedings of the 30th VLDB Conference, Toronto, Canada. Ontologies are increasingly being used to build applications that utilize domain-specific knowledge. This paper addresses the problem of supporting ontology-based semantic matching in RDBMS. The approach enables users to reference ontology data directly from SQL using the semantic match operators, thereby opening up possibilities of combining with other operations such as joins as well as making the ontology-driven applications easy to develop and efficient. In contrast, other approaches use RDBMS only for storage of ontologies and querying of ontology data is typically done via APIs. This paper presents the ontology-related functionality including inferencing, discusses how it is implemented on top of Oracle RDBMS, and illustrates the usage with several database applications. [1] http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF [2] http://www-scf.usc.edu/~csci586/ppt-2005/dguo.pps Besides, what are your opinions on the subject? Is it worthful to have such native implementation on PG, even as a modest contrib module? Based on your development know-how on the backend internals, how difficult is it to develop this (3 operators + 1 index type)? For simple situations of BT/NT ontology relationships, check out the ltree contrib module*. It allows building and indexing of directed graphs and other network-like structures, and seems ideal for both path storage and search. [thinks some more] If applied as a column per axis, ltree may be enough ... interesting stuff in any case. * http://www.sai.msu.su/~megera/postgres/gist/ltree/ -- Best regards, Rodrigo Hjort http://icewall.org/~hjort CELEPAR - Cia de Informática do Paraná - Brasil http://www.pr.gov.br -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Nasty VACUUM/bgwriter/segmentation bug
Russell Smith [EMAIL PROTECTED] writes: Tom Lane wrote: What is happening is that during that 30-second wait, the bgwriter is dumping out all the dirty pages, and acquiring open file references to each segment of table foo as it does so. The VACUUM then truncates foo back to zero size, since it contains no data after the DELETE, and then the second INSERT bulks it up again. The problem is that the bgwriter still has open file references to the deleted segments after the first one, and that's where it'll write the data if given a chance. So the updates disappear into the ether as far as any other process is concerned, for each segment except the first. Does TRUNCATE suffer from the same issue? No, because TRUNCATE actually substitutes new physical files with different relfilenode numbers, in order to allow rollback if its transaction aborts. VACUUM is at risk, and I think we also use a non-transactional truncate for temp tables, but those shouldn't be at risk either. Seems reasonable from my lowly user point of view. Would there be a requirement to remove the extra segments at any point in the future or would they hang around on the disk forever? I'm envisioning that they'd stay there as long as the table exists (though I suppose a TRUNCATE could get rid of 'em). Zero-size files shouldn't be a big problem... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org