[DOCS] Documentation bug in 8.3?

2008-01-12 Thread Bruce Momjian
Reading through the text search data type docs:


http://www.postgresql.org/docs/8.3/static/datatype-textsearch.html#DATATYPE-TSVECTOR

it says:

Optionally, integer position(s) can be attached to any or all of the
lexemes:

SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11
rat:12'::tsvector;
  tsvector

---

 'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12
'sat':4

A position normally indicates the source word's location in the
document. Positional information can be used for proximity ranking.
Position values can range from 1 to 16383; larger numbers are silently
clamped to 16383. Duplicate position entries are discarded. 
  

However in my testing of 8.3 duplicate position entries are not
discarded:

test=> SELECT 'a:1 b:1'::tsvector;
  tsvector
-
 'a':1 'b':1
(1 row)

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [DOCS] Documentation bug in 8.3?

2008-01-12 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
>   clamped to 16383. Duplicate position entries are discarded. 
> 

> However in my testing of 8.3 duplicate position entries are not
> discarded:

>   test=> SELECT 'a:1 b:1'::tsvector;
> tsvector
>   -
>'a':1 'b':1
>   (1 row)

Those aren't duplicates, because they're not attached to the same
lexeme.  The comment is talking about this behavior:

regression=# SELECT 'a:1 a:1'::tsvector;
 tsvector 
--
 'a':1
(1 row)

regression=# SELECT 'a:1,2,1'::tsvector;
 tsvector 
--
 'a':1,2
(1 row)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [DOCS] Documentation bug in 8.3?

2008-01-12 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > clamped to 16383. Duplicate position entries are discarded. 
> >   
> 
> > However in my testing of 8.3 duplicate position entries are not
> > discarded:
> 
> > test=> SELECT 'a:1 b:1'::tsvector;
> >   tsvector
> > -
> >  'a':1 'b':1
> > (1 row)
> 
> Those aren't duplicates, because they're not attached to the same
> lexeme.  The comment is talking about this behavior:
> 
> regression=# SELECT 'a:1 a:1'::tsvector;
>  tsvector 
> --
>  'a':1
> (1 row)
> 
> regression=# SELECT 'a:1,2,1'::tsvector;
>  tsvector 
> --
>  'a':1,2
> (1 row)

OK, thanks.  I will clarify the documentation.  Patch attached and
applied.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/datatype.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.222
diff -c -c -r1.222 datatype.sgml
*** doc/src/sgml/datatype.sgml	2 Jan 2008 19:53:13 -	1.222
--- doc/src/sgml/datatype.sgml	12 Jan 2008 21:50:51 -
***
*** 3330,3336 
   document.  Positional information can be used for
   proximity ranking.  Position values can
   range from 1 to 16383; larger numbers are silently clamped to 16383.
!  Duplicate position entries are discarded.
  
  
  
--- 3330,3336 
   document.  Positional information can be used for
   proximity ranking.  Position values can
   range from 1 to 16383; larger numbers are silently clamped to 16383.
!  Duplicate positions for the same lexeme are discarded.
  
  
  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match