Re: [HACKERS] Updated tsearch documentation

2007-07-27 Thread Oleg Bartunov

On Thu, 26 Jul 2007, Bruce Momjian wrote:


Oleg Bartunov wrote:

Bruce,

I sent you link to my wiki page with summary of changes
http://www.sai.msu.su/~megera/wiki/ts_changes

Your documentation looks rather old.


I have updated it to reflect your changes:

http://momjian.us/expire/fulltext/HTML/textsearch-tables.html



Bruce, I noticed you miss many changes. For example,


options for stemmer has changed (it's documented in my ts_changes), 
so in 
http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-CONFIGURATION


ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 'english-utf8.stop';

should be


ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 
'StopFile=english-utf8.stop, Language=english';



Also, this is wrong

DROP TEXT SEARCH CONFIGURATION MAPPING ON pg FOR email, url, sfloat, uri, float;

it should be

ALTER TEXT SEARCH CONFIGURATION pg DROP MAPPING FOR email, url, sfloat, uri, 
float;

Configuration now doesn't have DEFAULT flag, so \dF should not display 'Y'


= \dF
pg_catalog | russian  | Y
public | pg   | Y


This is what I see now

postgres=# \dF public.*
List of fulltext configurations
 Schema | Name | Description
+--+-
 public | pg   |





---




Oleg
On Tue, 24 Jul 2007, Bruce Momjian wrote:



I have added more documentation to try to show how full text search is
used by user tables.  I think this the documentaiton is almost done:

http://momjian.us/expire/fulltext/HTML/textsearch-tables.html

---

Oleg Bartunov wrote:

On Wed, 18 Jul 2007, Bruce Momjian wrote:


Oleg, Teodor,

I am confused by the following example.  How does gin know to create a
tsvector, or does it?  Does gist know too?


No, gist doesn't know. I don't remember why, Teodor ?

For GIN see http://archives.postgresql.org/pgsql-hackers/2007-05/msg00625.php
for discussion



FYI, at some point we need to chat via instant messenger or IRC to
discuss the open items.  My chat information is here:

http://momjian.us/main/contact.html


I send you invitation for google talk, I use only chat in gmail.
My gmail account is [EMAIL PROTECTED]



---

SELECT title
FROM pgweb
WHERE textcat(title,body) @@ plainto_tsquery('create table')
ORDER BY dlm DESC LIMIT 10;

CREATE INDEX pgweb_idx ON pgweb USING gin(textcat(title,body));




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Updated tsearch documentation

2007-07-27 Thread Oleg Bartunov

On Thu, 26 Jul 2007, Bruce Momjian wrote:


Oleg Bartunov wrote:

On Wed, 25 Jul 2007, Erikjan wrote:


In
http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT

it says:

A document is any text file that can be opened, read, and modified.


OOps, in my original documentation it was:
Document, in usual meaning, is a text file, that one could open, read and 
modify.
I stress that in database document is something another.

http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html


I have updated the documentation:


http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT



Is't worth to reference OpenFTS which used for indexing file system ?


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Pavel Stehule

 1) Document the problem and do nothing else.
 2) Make default_text_search_config a postgresql.conf-only
setting, thereby making it impossible to change by non-super
users, or make it a super-user-only setting.
 3) Remove default_text_search_config and require the
configuration to be specified in each function call.


Hello,

2+.

Regards
Pavel Stehule

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Oleg Bartunov

On Fri, 27 Jul 2007, Pavel Stehule wrote:



1) Document the problem and do nothing else.
2) Make default_text_search_config a postgresql.conf-only
   setting, thereby making it impossible to change by non-super
   users, or make it a super-user-only setting.
3) Remove default_text_search_config and require the
   configuration to be specified in each function call.



Hello,

2+.



One of the most important purpose of integrating tsearch2 was to 
facilitate full-text search for people in hosting environment. Usually,

they have no superuser rights. I'm asking don't forget about them !

There is no problem with current behaviour once user understand what he do.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Oleg Bartunov

On Fri, 27 Jul 2007, Pavel Stehule wrote:


2007/7/27, Oleg Bartunov [EMAIL PROTECTED]:

On Fri, 27 Jul 2007, Pavel Stehule wrote:



1) Document the problem and do nothing else.
2) Make default_text_search_config a postgresql.conf-only
   setting, thereby making it impossible to change by non-super
   users, or make it a super-user-only setting.
3) Remove default_text_search_config and require the
   configuration to be specified in each function call.



Hello,

2+.



One of the most important purpose of integrating tsearch2 was to
facilitate full-text search for people in hosting environment. Usually,
they have no superuser rights. I'm asking don't forget about them !

There is no problem with current behaviour once user understand what he do.




I am not sure if postgresql is well for multilangual hosting
environment. There is problem with locales. Without COLLATE support
postgresql can't be used in similar environment. :(



configuration has NOTHING with language ! This is a most frequent myth about
configuration. It's just the way we chose for default_text_search_config to
use language part of locale at initdb time.
text search configuration is just a bind between parser to use for
breaking document by lexems and mapping between lexeme type and dictionaries.



nice a day
Pavel Stehule



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Pavel Stehule
2007/7/27, Oleg Bartunov [EMAIL PROTECTED]:
 On Fri, 27 Jul 2007, Pavel Stehule wrote:

 
  1) Document the problem and do nothing else.
  2) Make default_text_search_config a postgresql.conf-only
 setting, thereby making it impossible to change by non-super
 users, or make it a super-user-only setting.
  3) Remove default_text_search_config and require the
 configuration to be specified in each function call.
 
 
  Hello,
 
  2+.


 One of the most important purpose of integrating tsearch2 was to
 facilitate full-text search for people in hosting environment. Usually,
 they have no superuser rights. I'm asking don't forget about them !

 There is no problem with current behaviour once user understand what he do.



I am not sure if postgresql is well for multilangual hosting
environment. There is problem with locales. Without COLLATE support
postgresql can't be used in similar environment. :(

nice a day
Pavel Stehule

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] stats_block_level

2007-07-27 Thread Peter Eisentraut
Tom Lane wrote:
  Any reason not to just fold them both into stats_start_collector ?

 Well, then you couldn't turn collection on and off without restarting
 the postmaster, which might be a pain.

Maybe we don't actually need stats_start_collector, but instead we start 
it always and just have one knob to turn collection on and off.  I'm 
not sure whether the extra process would bother people if they're not 
collecting, but we have so many extra processes now, why would anyone 
care.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Pavel Stehule
 configuration has NOTHING with language ! This is a most frequent myth about
 configuration. It's just the way we chose for default_text_search_config to
 use language part of locale at initdb time.
 text search configuration is just a bind between parser to use for
 breaking document by lexems and mapping between lexeme type and dictionaries.

 

I spoke about impossibility well configuration of postgresql without
administrator's rights. For my czech environment is administrator's
rights necessary too, because czech dictionary aren't in default
installation.

Regards
Pavel Stehule

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] stats_block_level

2007-07-27 Thread Dave Page
Simon Riggs wrote:
 On Fri, 2007-07-27 at 04:29 -0400, Alvaro Herrera wrote:
 Peter Eisentraut wrote:
 Tom Lane wrote:
 Any reason not to just fold them both into stats_start_collector ?
 Well, then you couldn't turn collection on and off without restarting
 the postmaster, which might be a pain.
 Maybe we don't actually need stats_start_collector, but instead we start 
 it always and just have one knob to turn collection on and off.  I'm 
 not sure whether the extra process would bother people if they're not 
 collecting, but we have so many extra processes now, why would anyone 
 care.
 I agree.  Let's remove stats_start_collector and merge the other two
 into a single setting.  Anything more than that is overkill.

 Having a single idle process is not a problem to anyone.  It just sleeps
 all the time.  We are all used to having six useless getty processes and
 nobody cares.
 
 Yes, thats a great plan.
 
It gets my vote.

/D

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] stats_block_level

2007-07-27 Thread Alvaro Herrera
Peter Eisentraut wrote:
 Tom Lane wrote:
   Any reason not to just fold them both into stats_start_collector ?
 
  Well, then you couldn't turn collection on and off without restarting
  the postmaster, which might be a pain.
 
 Maybe we don't actually need stats_start_collector, but instead we start 
 it always and just have one knob to turn collection on and off.  I'm 
 not sure whether the extra process would bother people if they're not 
 collecting, but we have so many extra processes now, why would anyone 
 care.

I agree.  Let's remove stats_start_collector and merge the other two
into a single setting.  Anything more than that is overkill.

Having a single idle process is not a problem to anyone.  It just sleeps
all the time.  We are all used to having six useless getty processes and
nobody cares.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] stats_block_level

2007-07-27 Thread Simon Riggs
On Fri, 2007-07-27 at 04:29 -0400, Alvaro Herrera wrote:
 Peter Eisentraut wrote:
  Tom Lane wrote:
Any reason not to just fold them both into stats_start_collector ?
  
   Well, then you couldn't turn collection on and off without restarting
   the postmaster, which might be a pain.
  
  Maybe we don't actually need stats_start_collector, but instead we start 
  it always and just have one knob to turn collection on and off.  I'm 
  not sure whether the extra process would bother people if they're not 
  collecting, but we have so many extra processes now, why would anyone 
  care.
 
 I agree.  Let's remove stats_start_collector and merge the other two
 into a single setting.  Anything more than that is overkill.
 
 Having a single idle process is not a problem to anyone.  It just sleeps
 all the time.  We are all used to having six useless getty processes and
 nobody cares.

Yes, thats a great plan.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Magnus Hagander
On Thu, Jul 26, 2007 at 06:23:51PM -0400, Bruce Momjian wrote:
 Oleg Bartunov wrote:
   Second, I can't figure out how to reference a non-default
   configuration.
  
   See the multi-argument versions of to_tsvector etc.
  
   I do see a problem with having to_tsvector(config, text) plus
   to_tsvector(text) where the latter implicitly references a config
   selected by a GUC variable: how can you tell whether a query using the
   latter matches a particular index using the former?  There isn't
   anything in the current planner mechanisms that would make that work.
  
  Probably, having default text search configuration is not a good idea
  and we could just require it as a mandatory parameter, which could
  eliminate many confusion with selecting text search configuration.
 
 We have to decide if we want a GUC default_text_search_config, and if so
 when can it be changed.
 
 Right now there are three ways to create a tsvector (or tsquery)
 
   ::tsvector
   to_tsvector(value)
   to_tsvector(config, value)
 
 (ignoring plainto_tsvector)
 
 Only the last one specifies the configuration. The others use the
 configuration specified by default_text_search_config.  (We had an
 previous discussion on what the default value of
 default_text_search_config should be, and it was decided it should be
 set via initdb based on a flag or the locale.)
 
 Now, because most people use a single configuration, they can just set
 default_text_search_config and there is no need to specify the
 configuration name.
 
 However, expression indexes cause a problem here:
 
   
 http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
 
 We recommend that users create an expression index on the column they
 want to do a full text search on, e.g.
 
   CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(body));
 
 However, the big problem is that the expressions used in expression
 indexes should not change their output based on the value of a GUC
 variable (because it would corrupt the index), but in the case above,
 default_text_search_config controls what configuration is used, and
 hence the output of to_tsvector is changed if default_text_search_config
 changes.

It wuoldn't actually *corrupt* the index, right? You could end up with
wrong results, which might be regarded as corruption in one way, but as
long as you change the value back the index still works, no?


 We have a few possible options:
 
   1) Document the problem and do nothing else.
   2) Make default_text_search_config a postgresql.conf-only
  setting, thereby making it impossible to change by non-super
  users, or make it a super-user-only setting.
   3) Remove default_text_search_config and require the
  configuration to be specified in each function call.
 
 If we remove default_text_search_config, it would also make ::tsvector
 casting useless as well.

I think 3 is a really bad solution.

2 is a half-bad solution. Do we have a way to say that it can be set at
database-level for example, but not at user session? Making it
superuser-only to change it but not postgresql.conf-only could accomplish
that, along with warnings in the docs for the super user about the effects
on current indexes by changing it.

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-27 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 Simon Riggs [EMAIL PROTECTED] wrote:
 
  Read the heap blocks in sequence, but make a conditional lock for
  cleanup on each block. If we don't get it, sleep, then try again when we
  wake up. If we fail the second time, just skip the block completely.

It would be cool if we could do something like sweep a range of pages,
initiate IO for those that are not in shared buffers, and while that is
running, lock and clean up the ones that are in shared buffers, skipping
those that are not lockable right away; when that's done, go back to
those buffers that were gotten from I/O and clean those up.  And retry
the locking for those that couldn't be locked the first time around,
also conditionally.  And when that's all done, a third pass could get
those blocks that weren't cleaned up in none of the previous passes (and
this time the lock would not be conditional).

Then do a vacuum_delay sleep.

 When we allow some skips in removing dead tuples, can we guarantee
 pg_class.relfrozenxid?

No we can't.

 I think we might need additional freezing-xmax operations to avoid
 XID-wraparound in the first path of vacuum, though it hardly occurs.

I'm not sure I follow.  Can you elaborate?  Do you mean storing a
separate relfrozenxmax for each table or something like that?

 It might be a future topic ... if we are in the direciton of 
 optimistic sweeping, is it possible to remove the second path of vacuum
 completely? We just add XID of the vacuum to dead tuples we see in the
 first path. When backends find a dead tuple and see the transaction
 identified by XID in it has commited, they can freely reuse the area of
 the dead tuple because we can assume index entries pointing the tuple
 have been removed by the vacuum.

I would be worried about leftover index entries being later used by new
tuples in the heap.  Then when you visit the index, find that entry, go
to the heap and find the new tuple and return it, which could be bogus.
(Unless, I think, you check in the index when you are going to insert
the new index tuple -- if the CTID is already used, reuse that entry or
remove it before insertion).

I don't know.  Maybe it's OK but it seems messy even if it is.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Andrew Dunstan


[redirecting to -hackers]

Stephen Frost wrote:

* Gregory Stark ([EMAIL PROTECTED]) wrote:
  

Tom Lane [EMAIL PROTECTED] writes:



Stephen Frost [EMAIL PROTECTED] writes:
  

  Please find attached a minor patch to remove the constraints that a
  user can't include the delimiter or quote characters in a 'NULL AS'
  string when importing CSV files.


This can't really be sane can it?
  
  


Not very, no :-)
  

The alternative would be interpreting NULL strings after dequoting but that
would leave no way to include the NULL string literally. This solution means
there's no way to include it (if it needs quoting) but only when you specify
it this way.



Yeah, interpreting NULLs after dequoting means you've lost the
information about if it's quoted or not, or you have to add some funky
syntax to say if it's quoted, do it differently..., which is no good,
imv.

What the patch does basically is say give us the exact string that
shows up between the unquoted delimiters that you want to be treated
as a NULL.  This removes the complexity of the question about quoting,
unquoting, whatever, and makes it a very clear-cut, straight-forward
solution with no impact on existing users, imv.


  


This looks too clever by half, to me. Someone facing the problem you are 
facing would have to dig quite deep to find the solution you're promoting.


A much better way IMNSHO would be to add an extra FORCE switch. On 
input, FORCE NOT NULL says to treat an unquoted null as the literal 
value rather than as a null field for the columns named. The reverse 
would be to tell it to treat a quoted null as null rather than as the 
literal value, for the named columns. Perhaps that should just be FORCE 
NULL columnlist. It would be more explicit and at the same time would 
only apply to the named columns, rather than discarding totally the 
ability to distinguish between null and not null values.


This should probably be discussed on -hackers, anyway.



cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] LSN grouping within clog pages

2007-07-27 Thread Simon Riggs
On Thu, 2007-07-26 at 22:37 -0400, Tom Lane wrote:

 I've been looking at the way that the async-commit patch conserves
 shared memory space by remembering async commit LSNs for groups of
 transactions on a clog page, rather than having an LSN for each
 individual transaction slot.  This seems like a good plan to me,
 but I'm confused about one point.  The README text claims that each
 LSN represents a contiguous group of transactions, that is, with the
 proposed parameters each LSN would represent 256 sequential
 transactions.  However, it looks to me that what the code is actually
 doing:
 
 #define GetLSNIndex(slotno, xid)((slotno) * CLOG_LSNS_PER_PAGE + \
  (xid) % (TransactionId) 
 CLOG_XACTS_PER_LSN)
 
 results in transactions that are spaced 256 XIDs apart sharing the same
 LSN slot.  I'm not sure whether the code is good and the README is
 bogus, or vice versa. 

Well, in this case the README correctly expresses my intention and the
code does not.

  Sharing LSNs among contiguous groups of XIDs
 seems appealing because you'd expect that such a group would have
 relatively close LSNs, and so not much information is lost.  OTOH, the
 modulo idea is interesting too, because if the transaction rate is less
 than 256 commits per walwriter cycle, you'd effectively have exact
 information for all the currently unflushed transactions.  But the
 downside is that transactions that are really quite old might
 transiently appear un-hintable because some later transaction that
 happens to share that LSN slot isn't flushed yet.

The modulo idea was my original intention and I attempted to change this
in v23. The above paragraph is pretty much how my thoughts evolved and
now I'm fully on the side of contiguous rather than striped, even though
the code says differently.

 BTW, I don't think I believe at all the arguments given in the README
 about what CLOG_LSNS_PER_PAGE should be, particularly since possible
 changes in BLCKSZ weren't factored in.  I'm inclined to set it so
 that the LSNs take up the same amount of space as the clog buffers
 themselves, ie, BLCKSZ/8 LSNs per page.

Sure, BLCKSZ should be factored in. Do we really need so many LSNs
though?

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch

2007-07-27 Thread Gregory Stark

Testers here were having a hard time constructing test cases to reach some
lines touched by the varvarlena patch. Upon further investigation I'm
convinced they're unreachable.

Some were added when I did packed varlena -- I've removed those. These lines
were actually necessary earlier but when we made attstorage='p' exempt columns
from SHORT treatment that made these lines unreachable. I think they impede
the readability so it's best to just remove them.

The other lines I think are useful if only to make the code self-documenting.
But they're unreachable due to the way the loop works and the way it tracks
toast_action so I documented that fact for the next person using gcov.

I also couldn't find a way to trigger the pfree() lines but I'm still looking
at that. Even when we're retoasting previously detoasted datums from an
updated row we still don't call pfree so something's a bit weird here.

Lastly, we currently never compress anything below 256 bytes so we never
compress SHORT varlenas. But trying to compress a datum only to find it's
uncompressible is a fairly expensive operation. Aside from doing a palloc we
also end up having to do a whole iteration of this loop including
recalculating the size of the tuple and looking for the next largest datum.

I would suggest we should decrease the minimum size to 32 instead of the
current 256 which will mean we could compress SHORT data. But as long as we
don't do that we should have a check like below which will avoid trying to.
(We might still have a check against VARSIZE in toast_compress_datum to avoid
the palloc.)


Index: tuptoaster.c
===
RCS file: 
/home/stark/src/REPOSITORY/pgsql/src/backend/access/heap/tuptoaster.c,v
retrieving revision 1.74
diff -c -r1.74 tuptoaster.c
*** tuptoaster.c6 Apr 2007 04:21:41 -   1.74
--- tuptoaster.c27 Jul 2007 14:38:14 -
***
*** 535,541 
need_change = true;
need_free = true;
}
! 
/*
 * Remember the size of this attribute
 */
--- 535,548 
need_change = true;
need_free = true;
}
!   else if (VARATT_IS_SHORT(new_value) || 
VARSIZE(new_value)  256)
!   {
!   /* The default pg_lz_compress strategy doesn't 
compress things
!* under 256 bytes so skip iterations through 
the loop trying
!* to compress them */
!   toast_action[i] = 'x';
!   }
!   
/*
 * Remember the size of this attribute
 */
***
*** 590,596 
if (toast_action[i] != ' ')
continue;
if (VARATT_IS_EXTERNAL(toast_values[i]))
!   continue;
if (VARATT_IS_COMPRESSED(toast_values[i]))
continue;
if (att[i]-attstorage != 'x')
--- 597,604 
if (toast_action[i] != ' ')
continue;
if (VARATT_IS_EXTERNAL(toast_values[i]))
!   /* Dead code due to toast_action */
!   continue; 
if (VARATT_IS_COMPRESSED(toast_values[i]))
continue;
if (att[i]-attstorage != 'x')
***
*** 654,659 
--- 662,668 
if (toast_action[i] == 'p')
continue;
if (VARATT_IS_EXTERNAL(toast_values[i]))
+   /* Dead code due to toast_action */
continue;
if (att[i]-attstorage != 'x'  att[i]-attstorage != 
'e')
continue;
***
*** 703,712 
--- 712,723 
if (toast_action[i] != ' ')
continue;
if (VARATT_IS_EXTERNAL(toast_values[i]))
+   /* Dead code due to toast_action */
continue;
if (VARATT_IS_COMPRESSED(toast_values[i]))
continue;
if (att[i]-attstorage != 'm')
+   /* Dead code (what else could attstorage be at 
this point?) */
continue;
if (toast_sizes[i]  biggest_size)
{
***
*** 766,771 
--- 777,783 
  

Re: [HACKERS] stats_block_level

2007-07-27 Thread Simon Riggs
On Fri, 2007-07-27 at 10:15 +0100, Dave Page wrote:
 Simon Riggs wrote:
  On Fri, 2007-07-27 at 04:29 -0400, Alvaro Herrera wrote:
  Peter Eisentraut wrote:
  Tom Lane wrote:
  Any reason not to just fold them both into stats_start_collector ?
  Well, then you couldn't turn collection on and off without restarting
  the postmaster, which might be a pain.
  Maybe we don't actually need stats_start_collector, but instead we start 
  it always and just have one knob to turn collection on and off.  I'm 
  not sure whether the extra process would bother people if they're not 
  collecting, but we have so many extra processes now, why would anyone 
  care.
  I agree.  Let's remove stats_start_collector and merge the other two
  into a single setting.  Anything more than that is overkill.
 
  Having a single idle process is not a problem to anyone.  It just sleeps
  all the time.  We are all used to having six useless getty processes and
  nobody cares.
  
  Yes, thats a great plan.
  
 It gets my vote.

Look to -patches for an implementation of the above.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch

2007-07-27 Thread Gregory Stark

Sorry, meant to send the previous message to pgsql-patches.

Here's a version cut using cvs diff so it's usable with -p0

I added one more fixup. There was a silly test in toast_fetch_datum_slice()
which handled compressed datums. Returning a slice of a compressed datum is
nonsensical with toast since the resulting datum would be useless. I also
added an assertion in this function that the datum is external before we treat
it as a toast_pointer.

(Incidentally, I did eventually manage to construct a case to reach all the 
pfrees.)



tuptoaster-fixup.patch.gz
Description: Binary data



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Andrew Dunstan



Stephen Frost wrote:


I'm honestly not a big fan of the columnlist approach that's been
taken with the options.  While I understand the desire to seperate the
parsing from the typing, making the users essentially do that association
for us by way of making them specify how to handle each column explicitly
is worse than just accepting that different types may need to be handled
in different ways.
  


Whether or not you like it, the fact is it's there. I think any solution 
should be consistent with what is done.


When CSV was first discussed we looked at doing type-specific behaviour. 
The end of the long debate was that we simply couldn't do that safely, 
and the only recourse was to require the user to specify the behaviour 
required if it differed from the default. You might be inclined to want 
to revisit that, but I am not.


cheers

andrew



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 This looks too clever by half, to me. Someone facing the problem you are 
 facing would have to dig quite deep to find the solution you're promoting.

Oddly enough, it was one of the first things I tried when I discovered
it wasn't just realizing that ,, for an integer column meant NULL (and
instead was complaining loudly that you can't convert an empty string
into an integer).  It's also pretty clear, to me at least, to say 
put the exact string that shows up between the delimiters here
that you want treated as a NULL rather than well, if it's a column
which is quoted then you have to jump through these hoops and tell PG
about each one, but if it's not quoted you have to do this, etc, etc.

 A much better way IMNSHO would be to add an extra FORCE switch. On input, 
 FORCE NOT NULL says to treat an unquoted null as the literal value rather 
 than as a null field for the columns named. The reverse would be to tell it 
 to treat a quoted null as null rather than as the literal value, for the 
 named columns. Perhaps that should just be FORCE NULL columnlist. It 
 would be more explicit and at the same time would only apply to the named 
 columns, rather than discarding totally the ability to distinguish between 
 null and not null values.

I don't see that it needs to be 'more explicit', that's just silly.
Either the user indicated they want it, or they didn't.  What you're
suggesting adds in a bunch of, imv, unnecessary complication and ends up
making the resulting code that much bigger and uglier for not much gain.

I'm honestly not a big fan of the columnlist approach that's been
taken with the options.  While I understand the desire to seperate the
parsing from the typing, making the users essentially do that association
for us by way of making them specify how to handle each column explicitly
is worse than just accepting that different types may need to be handled
in different ways.

We could instead flip it around and force the users to specify, for
each column, what, exactly, should be done for that column by having
them specify a regexp for that column.  The regexp would implicitly have
the delimiter on each side of it and we'd just step through the string
matching as far as we can for each column.  Then it's nice and explicit
for everyone but probably not much fun to use.

 This should probably be discussed on -hackers, anyway.

As a small, unobtrusive patch, I felt it didn't need a long discussion
about what everyone's CSV files look like and how that just shouldn't
be done or that's just not sane.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] allow CSV quote in NULL

2007-07-27 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 Stephen Frost wrote:
 I'm honestly not a big fan of the columnlist approach that's been
 taken with the options.  While I understand the desire to seperate the
 parsing from the typing, making the users essentially do that association
 for us by way of making them specify how to handle each column explicitly
 is worse than just accepting that different types may need to be handled
 in different ways.
   

 Whether or not you like it, the fact is it's there. I think any solution 
 should be consistent with what is done.

Other, unrelated, options being or not being there doesn't really have
any bearing on this though.  I'm not inventing new syntax here.  I'm
just removing a restriction on what the user can do that doesn't need
to exist.  Indeed, other more convoluted and complex things could still
be added, if someone wants them, this doesn't prevent that.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Updated tsearch documentation

2007-07-27 Thread Bruce Momjian
Oleg Bartunov wrote:
 On Thu, 26 Jul 2007, Bruce Momjian wrote:
 
  Oleg Bartunov wrote:
  On Wed, 25 Jul 2007, Erikjan wrote:
 
  In
  http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
 
  it says:
 
  A document is any text file that can be opened, read, and modified.
 
  OOps, in my original documentation it was:
  Document, in usual meaning, is a text file, that one could open, read and 
  modify.
  I stress that in database document is something another.
 
  http://www.sai.msu.su/~megera/postgres/fts/doc/fts-whatdb.html
 
  I have updated the documentation:
 
  
  http://momjian.us/expire/fulltext/HTML/textsearch-intro.html#TEXTSEARCH-DOCUMENT
 
 
 Is't worth to reference OpenFTS which used for indexing file system ?

Uh, not sure.  I don't think so but we can add a URL to it if you can
find the right place.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Updated tsearch documentation

2007-07-27 Thread Bruce Momjian

Thanks, I found a few more places that needed updating.  It should be
accurate now.  Thanks for the report.

---

Oleg Bartunov wrote:
 On Thu, 26 Jul 2007, Bruce Momjian wrote:
 
  Oleg Bartunov wrote:
  Bruce,
 
  I sent you link to my wiki page with summary of changes
  http://www.sai.msu.su/~megera/wiki/ts_changes
 
  Your documentation looks rather old.
 
  I have updated it to reflect your changes:
 
  http://momjian.us/expire/fulltext/HTML/textsearch-tables.html
 
 
 Bruce, I noticed you miss many changes. For example,
 
 
 options for stemmer has changed (it's documented in my ts_changes), 
 so in 
 http://momjian.us/expire/fulltext/HTML/textsearch-tables.html#TEXTSEARCH-TABLES-CONFIGURATION
 
 ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 'english-utf8.stop';
 
 should be
 
 
 ALTER TEXT SEARCH DICTIONARY en_stem SET OPTION 
 'StopFile=english-utf8.stop, Language=english';
 
 
 Also, this is wrong
 
 DROP TEXT SEARCH CONFIGURATION MAPPING ON pg FOR email, url, sfloat, uri, 
 float;
 
 it should be
 
 ALTER TEXT SEARCH CONFIGURATION pg DROP MAPPING FOR email, url, sfloat, uri, 
 float;
 
 Configuration now doesn't have DEFAULT flag, so \dF should not display 'Y'
 
 
 = \dF
 pg_catalog | russian  | Y
 public | pg   | Y
 
 
 This is what I see now
 
 postgres=# \dF public.*
 List of fulltext configurations
   Schema | Name | Description
 +--+-
   public | pg   |
 
 
 
 
  ---
 
 
 
  Oleg
  On Tue, 24 Jul 2007, Bruce Momjian wrote:
 
 
  I have added more documentation to try to show how full text search is
  used by user tables.  I think this the documentaiton is almost done:
 
http://momjian.us/expire/fulltext/HTML/textsearch-tables.html
 
  ---
 
  Oleg Bartunov wrote:
  On Wed, 18 Jul 2007, Bruce Momjian wrote:
 
  Oleg, Teodor,
 
  I am confused by the following example.  How does gin know to create a
  tsvector, or does it?  Does gist know too?
 
  No, gist doesn't know. I don't remember why, Teodor ?
 
  For GIN see 
  http://archives.postgresql.org/pgsql-hackers/2007-05/msg00625.php
  for discussion
 
 
  FYI, at some point we need to chat via instant messenger or IRC to
  discuss the open items.  My chat information is here:
 
  http://momjian.us/main/contact.html
 
  I send you invitation for google talk, I use only chat in gmail.
  My gmail account is [EMAIL PROTECTED]
 
 
  ---
 
  SELECT title
  FROM pgweb
  WHERE textcat(title,body) @@ plainto_tsquery('create table')
  ORDER BY dlm DESC LIMIT 10;
 
  CREATE INDEX pgweb_idx ON pgweb USING gin(textcat(title,body));
 
 
 
   Regards,
   Oleg
  _
  Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
  Sternberg Astronomical Institute, Moscow University, Russia
  Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
  phone: +007(495)939-16-83, +007(495)939-23-83
 
 
 
 Regards,
 Oleg
  _
  Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
  Sternberg Astronomical Institute, Moscow University, Russia
  Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
  phone: +007(495)939-16-83, +007(495)939-23-83
 
 
 
   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updated tsearch documentation

2007-07-27 Thread Bruce Momjian
Dimitri Fontaine wrote:
-- Start of PGP signed section.
 Hi,
 
 Le mercredi 25 juillet 2007, Bruce Momjian a ?crit?:
  I have added more documentation to try to show how full text search is
  used by user tables.  I think this the documentaiton is almost done:
 
  http://momjian.us/expire/fulltext/HTML/textsearch-tables.html
 
 I've come to understand that GIN indexes are far more costly to update than 
 GiST one, and Oleg's wiki advice users to partition data and use GiST index 
 for live part and GIN index for archive part only.
 
 Is it worth mentioning this into this part of the documentation?
 And if mentioned here, partitioning step could certainly be part of the 
 example... or let it as a user exercise, but then explaining why GIN is a 
 good choice in the provided example.

Partitioning is already in the documentation:

Partitioning of big collections and the proper use of GiST and GIN
indexes allows the implementation of very fast searches with online
update. Partitioning can be done at the database level using table
inheritance and varnameconstraint_exclusion/, or distributing
documents over servers and collecting search results using the
filenamecontrib/dblink/ extension module. The latter is possible
because ranking functions use only local information.

I don't see a reason to provide an example beyond the existing examples
of how to do partitioning.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] default_text_search_config and expression indexes

2007-07-27 Thread Bruce Momjian
Magnus Hagander wrote:
  However, the big problem is that the expressions used in expression
  indexes should not change their output based on the value of a GUC
  variable (because it would corrupt the index), but in the case above,
  default_text_search_config controls what configuration is used, and
  hence the output of to_tsvector is changed if default_text_search_config
  changes.
 
 It wuoldn't actually *corrupt* the index, right? You could end up with
 wrong results, which might be regarded as corruption in one way, but as
 long as you change the value back the index still works, no?

Right, it would _temporarily_ corrupt it.  ;-)

  We have a few possible options:
  
  1) Document the problem and do nothing else.
  2) Make default_text_search_config a postgresql.conf-only
 setting, thereby making it impossible to change by non-super
 users, or make it a super-user-only setting.
  3) Remove default_text_search_config and require the
 configuration to be specified in each function call.
  
  If we remove default_text_search_config, it would also make ::tsvector
  casting useless as well.
 
 I think 3 is a really bad solution.
 
 2 is a half-bad solution. Do we have a way to say that it can be set at
 database-level for example, but not at user session? Making it
 superuser-only to change it but not postgresql.conf-only could accomplish
 that, along with warnings in the docs for the super user about the effects
 on current indexes by changing it.

OK, here is what I am thinking.  If we make default_text_search_config
super-user-only, then the user can't do SET (using zero_damaged_pages
as a superuser-only example):

test= set zero_damaged_pages = on;
ERROR:  permission denied to set parameter zero_damaged_pages

test= alter user guest set zero_damaged_pages = on;
ERROR:  permission denied to set parameter zero_damaged_pages

but the super-user can set it in postgresql.conf, or:

test=# alter user guest set zero_damaged_pages = on;
ALTER ROLE

or

test=# alter database vendor3 set zero_damaged_pages = on;
ALTER ROLE

meaning while it will be super-user-only, the administrator can set the
default for specific databases and users.  Is that the best approach?

A user can still over-ride the default by specifying the configuration
in the function call.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq