Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration

2006-11-18 Thread Simon Riggs
On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote:
 On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
   Having the supporting code in core does not make much of a difference
   otherwise from having it in contrib, does it?
  
  Given the nonextensibility of gram.y and keywords.c, it has to be in
  core to even think about having special syntax :-(
 
 Has anyone ever heard of extensible grammers? 

(not specifically answering Martijn...)

The main thought for me on this thread is: Why do we need to invent
*any* grammar to make this work? Why not just use functions?

For PITR we have pg_start_backup() rather than BACKUP START. For
advisory locks we have pg_advisory_lock()

What's wrong with having pg_tsearch_ functions to do everything? There's
nothing wrong with additional catalog tables/columns that are
manipulated by function calls only. We have that already - look at
pg_stat_reset() - no grammar stuff there.

Anybody with an Oracle or SQLServer background is used to seeing system
functions available as function calls; as I've observed above, so are
we. We should keep the grammar clean to allow a very close adherence to
SQL standards, IMHO.

I would like to see Oleg and Teodor's good work come into core, but I
don't want to see bucketfuls of new grammar issues.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration

2006-11-18 Thread Oleg Bartunov

On Sat, 18 Nov 2006, Simon Riggs wrote:


On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote:

On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:

Having the supporting code in core does not make much of a difference
otherwise from having it in contrib, does it?


Given the nonextensibility of gram.y and keywords.c, it has to be in
core to even think about having special syntax :-(


Has anyone ever heard of extensible grammers?


(not specifically answering Martijn...)

The main thought for me on this thread is: Why do we need to invent
*any* grammar to make this work? Why not just use functions?

For PITR we have pg_start_backup() rather than BACKUP START. For
advisory locks we have pg_advisory_lock()

What's wrong with having pg_tsearch_ functions to do everything? There's
nothing wrong with additional catalog tables/columns that are
manipulated by function calls only. We have that already - look at
pg_stat_reset() - no grammar stuff there.

Anybody with an Oracle or SQLServer background is used to seeing system
functions available as function calls; as I've observed above, so are
we. We should keep the grammar clean to allow a very close adherence to
SQL standards, IMHO.

I would like to see Oleg and Teodor's good work come into core, but I
don't want to see bucketfuls of new grammar issues.


Summarizing, we have two questions -

1. Will tsearch comes to the core
2. Do we need grammar changes

I hope, we have consensus about 1. - we need fts as a core feature. 
Second question is not very principal, that's why we asked -hackers.

So, if we'll not touch grammar, are there any issues with tsearch2 in core ?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration

2006-11-18 Thread Peter Eisentraut
Oleg Bartunov wrote:
 So, if we'll not touch grammar, are there any issues with tsearch2 in
 core ?

Are there any issues with tsearch2 not in core?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Fwd: [HACKERS] Remove contrib version of rtree_gist --- now in core system ?

2006-11-18 Thread jorge alberto

-- Forwarded message --
From: jorge alberto [EMAIL PROTECTED]
Date: Nov 17, 2006 2:09 PM
Subject: Re: [HACKERS] Remove contrib version of rtree_gist --- now in core
system ?
To: Tom Lane [EMAIL PROTECTED]

hi !
thanks for writing!

In postgresql version 8.0.9 the rtree_gist implementation is located at
contrib/rtree_gist ready for compilation and then ready for loading it into
a database, I want to know where can I find this rtree_gist implementation
in versions 8.1.x

regards
jorge

On 11/14/06, Tom Lane [EMAIL PROTECTED] wrote:


jorge alberto [EMAIL PROTECTED] writes:
 I have read that since version 8.1 the contrib version of rtree_gist is
not
 in /contrib anymore, now is in core system, but where can
 I find it?

You don't need to find it, it's built in.  Just create an index.

regards, tom lane



Re: [HACKERS] Custom Data Type Question

2006-11-18 Thread Simon Riggs
On Wed, 2006-11-15 at 16:38 -0500, Andrew Dunstan wrote:

 My little enumkit tool allows you to create enumerations today very 
 easily, but its values are completely hardcoded. However, the above 
 trick still works. The downside is that each enumeration type requires a 
 tiny bit of compilation.

Andrew,

Your enum sounds good, apart from the hardcoded/compilation thing. That
is a data management nightmare AFAICS and so restricts the usefulness of
the solution.

It would be much better to read things dynamically into an array, so
using an init function in *preload_libraries would work well.

I'd also love any suggestions as to how we might be able to use a
similar local-data-cacheing mechanism to work when we specify SQL
standard FOREIGN KEYs. It would be really cool to say USING LOCAL CACHE
or some way of avoiding the overhead of all those stored after triggers
and SPI SELECT statements when we've got checks against tables with only
a few rows where the values hardly ever change. The enum concept departs
radically from the declarative Referential Integrity concepts that many
of us are already used to. I'd like to be able to speed things up
without radical re-design of the database... so a few nicely sprinked
ALTER TABLE statements would be a much better way of implementing this
IMHO.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Shutting down a warm standby database in 8.2beta3

2006-11-18 Thread Stephen Harris
On Fri, Nov 17, 2006 at 11:40:36PM -0500, Tom Lane wrote:
 Stephen Harris [EMAIL PROTECTED] writes:
  Why not, after calling fork() create a new process group with setsid() and
  then instead of killing the recovery thread, kill the whole process group
  (-PID rather than PID)?  Then every process (the recovery thread, the
  system, the script, any child of the script) will all receive the signal.
 
 This seems like a good answer if setsid and/or setpgrp are universally
 available.  I fear it won't work on Windows though :-(.  Also, each

It's POSIX, so I would suppose it's standard on most modern *nix
platforms.  Windows... bluh.  I wonder how perl handles POSIX::setsid()
on Windows!

 backend would become its own process group leader --- does anyone know
 if adding hundreds of process groups would slow down any popular
 kernels?

Shouldn't hurt.  This is, after all, what using  in a command line
shell with job control (csh, ksh, tcsh, bash, zsh) does.  Because you only
run one archive or recovery thread at a time (which is very good and very
clever) you won't have too many process groups at any instance in time.

 [ thinks for a bit... ]  Another issue is that there'd be a race
 condition during backend start: if the postmaster tries to kill -PID
 before the backend has managed to execute setsid, it wouldn't work.

*ponder*  Bugger.  Standard solutions (eg try three times with a second
pause) would mitigate this, but  Hmm.

Another idea is to make the shutdown be more co-operative under control
of the script; eg an exit code of 0 means xlog is now available, code
if 1 means the log is non-existent (so recovery is complete) and an
exit code of 255 means failure to recover; perform database shutdown.
In this way a solution similar to the existing trigger files (recovery
complete) could be used.  It's a little messy in that pg_ctl wouldn't
be used to shutdown the database; the script would essentially tell
the recovery thread to abort, which would tell the main postmaster to
shutdown.  We'd have no clients connected, no child process running,
so a smart shutdown would work.

-- 

rgds
Stephen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work

2006-11-18 Thread Matt Miller
  Can't keywords share code
 
 the way to do what you want I think is
 like this:
 
 foo: bar_or_baz
  { code block }
;
 
 bar_or_baz: bar | baz ;

I'll try that, thanks.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work

2006-11-18 Thread Matt Miller
 I suggest you to contribute this kind of code to orafce project [1]

Thanks, I'll go play over there for a while.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work

2006-11-18 Thread Matt Miller
  I found it interesting that gram.c and parse.h already supported SYSDATE.
 
 Only after you ran bison ;-).  They're derived files.

Well, so much for my conspiracy theory.

Thanks for the bison lesson.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Custom Data Type Question

2006-11-18 Thread Andrew Dunstan



Simon Riggs wrote:

On Wed, 2006-11-15 at 16:38 -0500, Andrew Dunstan wrote:

  
My little enumkit tool allows you to create enumerations today very 
easily, but its values are completely hardcoded. However, the above 
trick still works. The downside is that each enumeration type requires a 
tiny bit of compilation.



Andrew,

Your enum sounds good, apart from the hardcoded/compilation thing. That
is a data management nightmare AFAICS and so restricts the usefulness of
the solution.


  


Simon, Tom Dunstan has submitted a patch for first class enum types that 
do not have the compilation requirement - it's in the queue for 8.3. You 
might want to review that.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration

2006-11-18 Thread Markus Schiltknecht

Hi,

Peter Eisentraut wrote:

Are there any issues with tsearch2 not in core?


I have run into troubles when restoring a dump, especially across 
different versions of PostgreSQL and tsearch2. Mainly because pg_ts_* 
are not system tables and thus need to be restored or installed separately.


And there still is the packaging issue which needs to be addressed. It's 
not complicated, but a PITA to compile stemmers and setup custom 
dictionaries.


What's really needed IMO is a clever packaging, including stemmers and 
dictionaries for as many languages as we can come up with. So on a 
debian system, it should become as simple as:


apt-get install postgresql-contrib-8.3
apt-get install postgresql-language-pack-english-8.3
apt-get install postgresql-language-pack-german-8.3
apt-get install postgresql-language-russian-german-8.3

Inclusion into core surely does not help with that.

Relabeling contrib to modules or extras or something would probably give 
some people a warm fuzzy feeling when installing. OTOH, these are 
probably the very same people who get excited about tsearch2 in core, so 
if we want to satisfy them, we better put it right into core... I dunno.


Regards

Markus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work

2006-11-18 Thread Matt Miller
  Why should we add this Oraclism to PostgreSQL? I doesn't add any new
  feature.

 Certainly, this feature falls well within the class of completely
 gratuitous proprietary extensions that we typically reject.

I now agree completely.  My purpose is to migrate Oracle databases to
Posgres, and I had thought that Oracle didn't support CURRENT_DATE,
CURRENT_TIMESTAMP, and so on.  However, I've just learned otherwise. So,
I think the proper migration process for a production database would be
to first change the Oracle DB to use CURRENT_DATE (or some other
standard psuedo column), since that will work properly under both Oracle
and Postgres.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: syntax of operation with tsearch's configur

2006-11-18 Thread Dave Page


 --- Original Message ---
 From: Peter Eisentraut [EMAIL PROTECTED]
 To: Jeremy Drake [EMAIL PROTECTED]
 Sent: 18/11/06, 07:30:48
 Subject: Re: [HACKERS] Proposal: syntax of operation with tsearch's 
 configuration
 
 Jeremy Drake wrote:
  I am currently in the position that my hosting provider is
  apprehensive about installing modules in contrib because they believe
  they are less secure.
 
 Using irrational and unfounded statements one can of course make 
 arguments for just about anything, but that won't help us.

Unfounded? pg_file_write() is in contrib, not core precisely because some 
considered it to be a possible risk (despite COPY being as dangerous) and 
wanted to make it a  non-default option.

Personally I think a guc to disable it, COPY and other potentially dangerous 
utility functions would be preferrable but that's a whole other argument.

Regards, Dave

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: syntax of operation with

2006-11-18 Thread Andrew Dunstan
Peter Eisentraut wrote:
 Oleg Bartunov wrote:
 So, if we'll not touch grammar, are there any issues with tsearch2 in
 core ?

 Are there any issues with tsearch2 not in core?



Quite apart from anything else, it really needs documentation of the
standard we give other core features.

I think if a feature will be of sufficiently general use it should be a
candidate for inclusion, and text search certainly comes within that
category in my mind.

cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Proposal: syntax of operation with tsearch'sconfiguration

2006-11-18 Thread Oleg Bartunov

On Sat, 18 Nov 2006, Andrew Dunstan wrote:


Peter Eisentraut wrote:

Oleg Bartunov wrote:

So, if we'll not touch grammar, are there any issues with tsearch2 in
core ?


Are there any issues with tsearch2 not in core?




Quite apart from anything else, it really needs documentation of the
standard we give other core features.



Sure. I just learned how to built (successfully) pg documentation and
researching on what's documentation standard. Should we need to write 
separate full text search chapter and/or add description to relevant

chapters.



I think if a feature will be of sufficiently general use it should be a
candidate for inclusion, and text search certainly comes within that
category in my mind.


It could helps us in Pg-MySQL discussions, at least, since we beat 
mysql's fts :)


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Ontology on PostgreSQL - is there something?

2006-11-18 Thread Rodrigo Hjort

Dear PG-hackers,

Based on the paper below [1], I ask: is there anyone working on, or already
tried to such native implementation on PostgreSQL? I didn't find anything
related on pgFoundry. There is also a presentation [2] related to the paper.

By Souripriya Das, Eugene Inseok Chong, George Eadon, Jagannathan
Srinivasan, Proceedings of the 30th VLDB Conference, Toronto, Canada.
Ontologies are increasingly being used to build applications that utilize
domain-specific knowledge. This paper addresses the problem of supporting
ontology-based semantic matching in RDBMS. The approach enables users to
reference ontology data directly from SQL using the semantic match
operators, thereby opening up possibilities of combining with other
operations such as joins as well as making the ontology-driven applications
easy to develop and efficient. In contrast, other approaches use RDBMS only
for storage of ontologies and querying of ontology data is typically done
via APIs. This paper presents the ontology-related functionality including
inferencing, discusses how it is implemented on top of Oracle RDBMS, and
illustrates the usage with several database applications.

[1]
http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF
[2] http://www-scf.usc.edu/~csci586/ppt-2005/dguo.pps

Besides, what are your opinions on the subject?

Is it worthful to have such native implementation on PG, even as a modest
contrib module?
Based on your development know-how on the backend internals, how difficult
is it to develop this (3 operators + 1 index type)?

--
Best regards,

Rodrigo Hjort
http://icewall.org/~hjort
CELEPAR - Cia de Informática do Paraná - Brasil
http://www.pr.gov.br


Re: [HACKERS] Proposal: syntax of operation with

2006-11-18 Thread Stefan Kaltenbrunner
Andrew Dunstan wrote:
 Peter Eisentraut wrote:
 Oleg Bartunov wrote:
 So, if we'll not touch grammar, are there any issues with tsearch2 in
 core ?
 Are there any issues with tsearch2 not in core?

 
 
 Quite apart from anything else, it really needs documentation of the
 standard we give other core features.
 
 I think if a feature will be of sufficiently general use it should be a
 candidate for inclusion, and text search certainly comes within that
 category in my mind.

I agree here - full text search is of general use (and a very often
requested) feature - including it in core will both help us in marketing
postgresql (which should notbe seen as a bad thing at all) and more to
the point it provides an in-core user and showcase for two very powerful
and innovative technologies - GIST and GIN.


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Proposal: syntax of operation with tsearch's configuration

2006-11-18 Thread Stefan Kaltenbrunner
Alvaro Herrera wrote:
 Oleg Bartunov wrote:
 On Fri, 17 Nov 2006, Andrew Dunstan wrote:
 
 I am also a bit concerned that the names of the proposed objects (parser, 
 dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and 
 TS_PARSER might be better if we in fact need to name them.
 this looks reasonable to me.
 
 Huh, but we don't use keywords with ugly abbreviations and underscores.
 How about FULLTEXT DICTIONARY and FULLTEXT PARSER?  (Using
 FULLTEXT instead of FULL TEXT means you don't created common
 reserved words, and furthermore you don't collide with an existing type
 name.)

sounds fine

 
 I also think the thousands of lines is an exaggeration :-)  The
 grammar should take a couple dozen at most.  The rest of the code would
 go to their own files.
 
 We should also take the opportunity to discuss new keywords for the XML
 support -- will we use new grammar, or functions?
 

that is a good question and we should decide on a direction for that -
we already have a feature in shipping code that causes quite some
confusion in that regard(autovacuum).
What see I from supporting/consulting people is that there are more and
more people adapting autovacuum for there databases but those with
complex ones want to override them on a per table base.
We already provide a rather crude interface for that - namely manually
inserting some rows into a system table which is confusing the heck out
of people (they are either responding with why is there now ALTER
AUTOVACUUM SET ... or and equivalent pg_* function for that).
I'm not too sure what the most suitable interface for that would be but
finding a consistent solution for that might be good nevertheless.


Stefan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [Fwd: Index Advisor]

2006-11-18 Thread Kai-Uwe Sattler

Hi Gurjeet,
I will look at the pg_advise bug and will send a patch ASAP.
Best,
  Kai

Am 15.11.2006 um 15:34 schrieb Gurjeet Singh:

BUGS:
=
.) The SELECTs in the pg_advise are returning wrong results, when  
the same index is suggested twice, because of the SUM() aggregates.
.) I doubt that on a table t(a,b), for a suggestion of idx(b,a),  
pg_advise will

suggest idx(a,b);

Wish-list:
==
.) Make pg_indexadvisor a user table.
  Reason: a normal user cannot do delete from pg_indexadvisor.
  Difficulty: Need to know how to do
insert into pg_indexadvisor values( 1, ...)
  from within the backend; that is, need to study/ 
invent RSI

  (Recursive SQL Interface).
  Trial code can be seen by searching for:
exec_simple_query( insert into index_advisor values 
( 10 ),

advisor /*portal name*/ );

.) Make it plugin-based.
  Reason: so that someone else with a better idea can replace
  this advisor, without having to recompile the server.
  Difficulty: This code calls many internal functoions:
 index_create(), index_drop(), planner(), etc.
  That makes it impossible to compile it standalone.

.) Remove the dependency on the global index_candidates; used for
 communication between indexadvisor.c and plancat.c.
Reason: Bad coding practice.
Difficulty: Even though I was successful in updating  
pg_class.relpages for
the virtual indexes, the planner is still calling  
smgr.c code to
get the number of pages occupied by the index!  
Hence, I had to

use the global the way I did.

Best regards,

--
[EMAIL PROTECTED]
singh.gurjeet @{ gmail | hotmail | yahoo }.com
patch_and_other_files.tar.gz



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Ontology on PostgreSQL - is there something?

2006-11-18 Thread Josh Berkus
Rodrigo,

 Besides, what are your opinions on the subject?

That I don't understand what they're talking about.   What's Ontology in a 
database sense?   Can you give some examples?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Allowing SYSDATE to Work

2006-11-18 Thread Josh Berkus
Matt,

 I now agree completely.  My purpose is to migrate Oracle databases to
 Posgres, and I had thought that Oracle didn't support CURRENT_DATE,
 CURRENT_TIMESTAMP, and so on.  However, I've just learned otherwise. So,
 I think the proper migration process for a production database would be
 to first change the Oracle DB to use CURRENT_DATE (or some other
 standard psuedo column), since that will work properly under both Oracle
 and Postgres.

Yep, or use the Orafce project.We're happy to support compatibility syntax 
in completely separate add-in projects.  Just not in the core code.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Nasty VACUUM/bgwriter/segmentation bug

2006-11-18 Thread Tom Lane
While working on fixing the recently reported hash-index problem,
I was using a test build with a very small RELSEG_SIZE (128K),
so that I could trigger the reported bug with a reasonably small
amount of data.  And I started finding some unexpected data corruption.
I eventually reduced it to this test case:

checkpoint;
create table foo (f1 int);
insert into foo select x from generate_series(1,10) x;
-- wait 30 seconds
delete from foo;
vacuum verbose foo;
insert into foo select x from generate_series(1,10) x;
\q
stop and restart postmaster, then
vacuum verbose foo;

This vacuum will generate a whole lot of 
WARNING:  relation foo page 16 is uninitialized --- fixing
WARNING:  relation foo page 17 is uninitialized --- fixing
...
and when the dust settles, most of the second batch of 10 rows
is gone.

What is happening is that during that 30-second wait, the bgwriter is
dumping out all the dirty pages, and acquiring open file references
to each segment of table foo as it does so.  The VACUUM then truncates
foo back to zero size, since it contains no data after the DELETE,
and then the second INSERT bulks it up again.  The problem is that the
bgwriter still has open file references to the deleted segments after
the first one, and that's where it'll write the data if given a chance.
So the updates disappear into the ether as far as any other process is
concerned, for each segment except the first.

There's a rather indirect mechanism that's supposed to prevent similar
problems between two backends: a file truncation is supposed to be
associated with a forced relcache flush, and that causes an smgrclose(),
so other backends will be forced to reopen the file(s) before they can
do any more work with the truncated relation.  On reflection I think
I don't trust this though, because in the case where a backend writes a
dirty buffer because it needs to reclaim a buffer, it doesn't try to
open the relation at the relcache level (it cannot, because the rel
might be in a different database).  So it could use a stale smgr
relation, same as the bgwriter.  The bgwriter does not participate
in shared cache inval, not having a relcache in the first place, and
so this mechanism doesn't help it anyway.

This is a fairly low-probability bug in real-world cases, because it
could only happen when a relation is truncated and then re-expanded
across a 1GB segment boundary.  Moreover, because the bgwriter flushes
all its open files after each checkpoint, the window for trouble only
extends to the next checkpoint.  But it definitely could happen, and
it might explain some irreproducible corruption reports.

I think that the easiest fix might be to not remove no-longer-used
segment files during a truncate, but simply reduce them to zero size
rather than delete them.  Then any open file pointers aren't
invalidated.  The only alternative I can see is to invent some new
signaling mechanism to force closure of open files, but that seems
ugly, complex, and perhaps subject to race conditions.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Ontology on PostgreSQL - is there something?

2006-11-18 Thread David Boreham

Josh Berkus wrote:



That I don't understand what they're talking about.   What's Ontology in a 
database sense?   Can you give some examples?
 


You'll need to RTFP , which in all fairness the OP did cite in his posting :

http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF




Re: [HACKERS] Nasty VACUUM/bgwriter/segmentation bug

2006-11-18 Thread Russell Smith

Tom Lane wrote:

While working on fixing the recently reported hash-index problem,
I was using a test build with a very small RELSEG_SIZE (128K),
so that I could trigger the reported bug with a reasonably small
amount of data.  And I started finding some unexpected data corruption.
I eventually reduced it to this test case:

checkpoint;
create table foo (f1 int);
insert into foo select x from generate_series(1,10) x;
-- wait 30 seconds
delete from foo;
vacuum verbose foo;
insert into foo select x from generate_series(1,10) x;
\q
stop and restart postmaster, then
vacuum verbose foo;

This vacuum will generate a whole lot of 
WARNING:  relation foo page 16 is uninitialized --- fixing

WARNING:  relation foo page 17 is uninitialized --- fixing
...
and when the dust settles, most of the second batch of 10 rows
is gone.

What is happening is that during that 30-second wait, the bgwriter is
dumping out all the dirty pages, and acquiring open file references
to each segment of table foo as it does so.  The VACUUM then truncates
foo back to zero size, since it contains no data after the DELETE,
and then the second INSERT bulks it up again.  The problem is that the
bgwriter still has open file references to the deleted segments after
the first one, and that's where it'll write the data if given a chance.
So the updates disappear into the ether as far as any other process is
concerned, for each segment except the first.
  

Does TRUNCATE suffer from the same issue?

There's a rather indirect mechanism that's supposed to prevent similar
problems between two backends: a file truncation is supposed to be
associated with a forced relcache flush, and that causes an smgrclose(),
so other backends will be forced to reopen the file(s) before they can
do any more work with the truncated relation.  On reflection I think
I don't trust this though, because in the case where a backend writes a
dirty buffer because it needs to reclaim a buffer, it doesn't try to
open the relation at the relcache level (it cannot, because the rel
might be in a different database).  So it could use a stale smgr
relation, same as the bgwriter.  The bgwriter does not participate
in shared cache inval, not having a relcache in the first place, and
so this mechanism doesn't help it anyway.

This is a fairly low-probability bug in real-world cases, because it
could only happen when a relation is truncated and then re-expanded
across a 1GB segment boundary.  Moreover, because the bgwriter flushes
all its open files after each checkpoint, the window for trouble only
extends to the next checkpoint.  But it definitely could happen, and
it might explain some irreproducible corruption reports.
  
Regular imports that delete data or truncate relations would increase 
this probability wouldn't they?
Autovac is also likely to run on that relation in the wait phase, 
which other relations are being truncated by an import process.



I think that the easiest fix might be to not remove no-longer-used
segment files during a truncate, but simply reduce them to zero size
rather than delete them.  Then any open file pointers aren't
invalidated.  The only alternative I can see is to invent some new
signaling mechanism to force closure of open files, but that seems
ugly, complex, and perhaps subject to race conditions.

Thoughts?
  
Seems reasonable from my lowly user point of view.  Would there be a 
requirement to remove the extra segments at any point in the future or 
would they hang around on the disk forever?


Russell Smith

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


  



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Ontology on PostgreSQL - is there something?

2006-11-18 Thread Mike Rylander

On 11/18/06, Rodrigo Hjort [EMAIL PROTECTED] wrote:

Dear PG-hackers,

Based on the paper below [1], I ask: is there anyone working on, or already
tried to such native implementation on PostgreSQL? I didn't find anything
related on pgFoundry. There is also a presentation [2] related to the paper.

By Souripriya Das, Eugene Inseok Chong, George Eadon, Jagannathan
Srinivasan, Proceedings of the 30th VLDB Conference, Toronto, Canada.
Ontologies are increasingly being used to build applications that utilize
domain-specific knowledge. This paper addresses the problem of supporting
ontology-based semantic matching in RDBMS. The approach enables users to
reference ontology data directly from SQL using the semantic match
operators, thereby opening up possibilities of combining with other
operations such as joins as well as making the ontology-driven applications
easy to develop and efficient. In contrast, other approaches use RDBMS only
for storage of ontologies and querying of ontology data is typically done
via APIs. This paper presents the ontology-related functionality including
inferencing, discusses how it is implemented on top of Oracle RDBMS, and
illustrates the usage with several database applications.

[1]
http://www.cs.toronto.edu/vldb04/protected/eProceedings/contents/pdf/IND3P1.PDF
[2] http://www-scf.usc.edu/~csci586/ppt-2005/dguo.pps

Besides, what are your opinions on the subject?

Is it worthful to have such native implementation on PG, even as a modest
contrib module?
Based on your development know-how on the backend internals, how difficult
is it to develop this (3 operators + 1 index type)?


For simple situations of BT/NT ontology relationships, check out the
ltree contrib module*.  It allows building and indexing of directed
graphs and other network-like structures, and seems ideal for both
path storage and search.

[thinks some more]

If applied as a column per axis, ltree may be enough ... interesting
stuff in any case.

* http://www.sai.msu.su/~megera/postgres/gist/ltree/



--
Best regards,

Rodrigo Hjort
http://icewall.org/~hjort
CELEPAR - Cia de Informática do Paraná - Brasil
http://www.pr.gov.br





--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Nasty VACUUM/bgwriter/segmentation bug

2006-11-18 Thread Tom Lane
Russell Smith [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 What is happening is that during that 30-second wait, the bgwriter is
 dumping out all the dirty pages, and acquiring open file references
 to each segment of table foo as it does so.  The VACUUM then truncates
 foo back to zero size, since it contains no data after the DELETE,
 and then the second INSERT bulks it up again.  The problem is that the
 bgwriter still has open file references to the deleted segments after
 the first one, and that's where it'll write the data if given a chance.
 So the updates disappear into the ether as far as any other process is
 concerned, for each segment except the first.
 
 Does TRUNCATE suffer from the same issue?

No, because TRUNCATE actually substitutes new physical files with
different relfilenode numbers, in order to allow rollback if its
transaction aborts.  VACUUM is at risk, and I think we also use a
non-transactional truncate for temp tables, but those shouldn't be at
risk either.

 Seems reasonable from my lowly user point of view.  Would there be a 
 requirement to remove the extra segments at any point in the future or 
 would they hang around on the disk forever?

I'm envisioning that they'd stay there as long as the table exists
(though I suppose a TRUNCATE could get rid of 'em).  Zero-size files
shouldn't be a big problem...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org