Re: [GENERAL] full text indexing

2006-03-16 Thread Oleg Bartunov

Chris,

try REL8_1_STABLE branch, because you save a lot of time of indexing

Oleg
On Fri, 17 Mar 2006, Chris wrote:


Oleg Bartunov wrote:

On Wed, 15 Mar 2006, chris smith wrote:


Hi all,

Just wondering which full text module is better & what the differences
are between tsearch and fti ?


if you need online indexing and linguistic support (dictionaries, stop 
words, ranking) tsearch2 is fine. If your data are static and you need

only strict search, fti could works for you.
We hope to develop inverted index support for tsearch2 this year, though.



The table in question has roughly 80,000 rows.



how many unique words and how long are documents ?


I think I'll stick to tsearch, it seems to work pretty well.

I created the words list from one table (2 columns) for fti. 2.6G file and 
almost 200 million records, and took up almost 8.5G of space in the database.


I was going to see if I could do a speed comparison between the two but ran 
out of space before I could create the indexes on this table.


Hmm. Might try it on a different table and see what happens, I'm interested 
to see the differences (if any) between the results :)





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] full text indexing

2006-03-16 Thread Chris

Oleg Bartunov wrote:

On Wed, 15 Mar 2006, chris smith wrote:


Hi all,

Just wondering which full text module is better & what the differences
are between tsearch and fti ?


if you need online indexing and linguistic support (dictionaries, stop 
words, ranking) tsearch2 is fine. If your data are static and you need

only strict search, fti could works for you.
We hope to develop inverted index support for tsearch2 this year, though.



The table in question has roughly 80,000 rows.



how many unique words and how long are documents ?


I think I'll stick to tsearch, it seems to work pretty well.

I created the words list from one table (2 columns) for fti. 2.6G file 
and almost 200 million records, and took up almost 8.5G of space in the 
database.


I was going to see if I could do a speed comparison between the two but 
ran out of space before I could create the indexes on this table.


Hmm. Might try it on a different table and see what happens, I'm 
interested to see the differences (if any) between the results :)


--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] full text indexing

2006-03-15 Thread Ian Harding
On 3/15/06, chris smith <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Just wondering which full text module is better & what the differences
> are between tsearch and fti ?

Having only used tsearch/tsearch2 all I can say that it works as
advertised and I am extremely happy with it.

- Ian

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] full text indexing

2006-03-15 Thread Oleg Bartunov

On Wed, 15 Mar 2006, chris smith wrote:


Hi all,

Just wondering which full text module is better & what the differences
are between tsearch and fti ?


if you need online indexing and linguistic support (dictionaries, stop 
words, ranking) tsearch2 is fine. If your data are static and you need

only strict search, fti could works for you.
We hope to develop inverted index support for tsearch2 this year, though.



The table in question has roughly 80,000 rows.


how many unique words and how long are documents ?




Thanks!
--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] full text indexing

2006-03-15 Thread Richard Huxton

chris smith wrote:

Hi all,

Just wondering which full text module is better & what the differences
are between tsearch and fti ?

The table in question has roughly 80,000 rows.


I've been very happy with tsearch2. Note that if you're running an old 
version of PostgreSQL (7.4?) there are some manual steps you'll need to 
take to dump + restore.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[GENERAL] full text indexing

2006-03-15 Thread chris smith
Hi all,

Just wondering which full text module is better & what the differences
are between tsearch and fti ?

The table in question has roughly 80,000 rows.

Thanks!
--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Full Text Indexing and Syntax

2006-02-28 Thread Joshua D. Drake




EXPLAIN
SELECT t1.id, t2.id
FROM test1 t1, test2 t2
WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article ));


It doesn't appear that you're using tsearch2. PostgreSQL does not 
include full text search in the basic installation. Have you installed 
tsearch2?

Which is included in the PostgreSQL contrib

Joshua D. Drake



Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 6: explain analyze is your friend




--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Full Text Indexing and Syntax

2006-02-28 Thread Michael Glaesemann


On Mar 1, 2006, at 6:54 , flood wrote:


Unfortunately I can not seem to get my query to use PG's full text
indexing, it keeps doing a seq scan:




EXPLAIN
SELECT t1.id, t2.id
FROM test1 t1, test2 t2
WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article ));


It doesn't appear that you're using tsearch2. PostgreSQL does not  
include full text search in the basic installation. Have you  
installed tsearch2?


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Full Text Indexing and Syntax

2006-02-28 Thread flood
Hi folks, I am having some trouble with this query that should be using
FTI.  There are 2 tables, one with a list of keywords and the other
containing a body of articles.

I am trying to get a query to return the IDs of each keyword with the
ID of each article that contains that keyword.

So the 2 tables are:

test1:
 ->id
 ->keyword

test2:
 ->id
 ->article

Unfortunately I can not seem to get my query to use PG's full text
indexing, it keeps doing a seq scan:

EXPLAIN
SELECT t1.id, t2.id
FROM test1 t1, test2 t2
WHERE lower( t1.keyword ) ~ ( lower ( '^' || t2.article ));
-
 Nested Loop  (cost=20.00..30040.00 rows=5001 width=8)
   Join Filter: (lower("outer".keyword) ~ lower(('^'::text ||
"inner".article)))
   ->  Seq Scan on test1 t1  (cost=0.00..20.00 rows=1000 width=36)
   ->  Materialize  (cost=20.00..30.00 rows=1000 width=36)
 ->  Seq Scan on test2 t2  (cost=0.00..20.00 rows=1000
width=36)


Could someone explain how I can restructure this SQL to use the index,
or perhaps suggest a workaround?  

I am using PG 7.4.7 on Debian 3.1.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Full Text Indexing Using Tsearch2-Module

2006-01-23 Thread Praveen Kumar (TUV)
 
Hello All,
I have installed Tsearch-Module for full text indexing .But when I search text 
using gist(idxFTI) index on table I also found 
all data which have same accent. Example
1.If I try search for MANI word it also search for MANY word.
2.If I try search for ANDY word it also search for ANDI word.
Please can you tell me how to avoid this problem ? If I want to search text 
MANI it should search only for MANI not MANY.

-Original Message-
From: Jim Nasby [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 12, 2006 1:01 AM
To: Praveen Kumar (TUV); [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Subject: RE: [SPAM] - how can we use outer join in Postures - Found
word(s) if you received this in error in the Text body


This should really have been sent to the -general mailing list, so I'm adding 
it.

You will need to use OUTER JOIN syntax to accomplish this in PostgreSQL; see 
http://www.postgresql.org/docs/8.0/interactive/queries-table-expressions.html#QUERIES-FROM

Note that most databases (including Oracle) now support JOIN syntax instead of 
other hacks to support outer joins.

-Original Message-
From: Praveen Kumar (TUV) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 11, 2006 6:39 AM
To: Jim Nasby; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL 
PROTECTED]
Subject: [SPAM] - how can we use outer join in Postures - Found word(s) if you 
received this in error in the Text body




Hello All,
I want to create one table with from many different tables using outer 
joins.Please can you guide how is possible to create in Postgresql.
Let we have syntax for creating table in oracle.If we want to create same table 
in Postgresql  then how will we replace (+) in syntax sothat we can use outer 
join facility in 
PostgreSQL.

CREATE table comp_prod_cert 
AS
select 
tuv_tuvdotcom_mast.tuvdotcom as tuvdotcom,
COALESCE(tuv_location_mast_intl.first_name,tuv_location_mast.first_name) 
|| ' ' || 
COALESCE(tuv_location_mast_intl.second_name,tuv_location_mast.second_name) as 
company_name,
tuv_certificate_mast.cert_id as cert_id,
tuv_certificate_mast.cert_number as certificate_number,
tuv_certificate_mast.cust_id as cust_id,
COALESCE(tuv_zart_mast_intl.description,tuv_zart_mast.description) as 
description,
tuv_tuvdotcom_mast.tuvdotcom_id as tuvdotcom_id,tuv_tuvdotcom_mast.status_id as 
status_id,
'' page_valid_from, sysdate page_creation_date,
tuv_tuvdotcom_mast.tuvdotcom||' 
'||COALESCE(tuv_tuvdotcom_intl.sublease_company_name,tuv_tuvdotcom_mast.sublease_company_name)||'
 '
||COALESCE(tuv_tuvdotcom_intl.marketing_info,tuv_tuvdotcom_mast.marketing_info)||'
 '||tuv_certificate_mast.cert_number||' '
||tuv_certificate_mast.scope_english||' '||tuv_certificate_mast.scope_german||' 
'||tuv_certificate_mast.scope_local||' '
||tuv_zart_mast.zart_name||' 
'||COALESCE(tuv_zart_mast_intl.description,tuv_zart_mast.description)||' '
||COALESCE(tuv_customer_mast_intl.url,tuv_customer_mast.url)||' 
'||COALESCE(tuv_customer_mast_intl.email,tuv_customer_mast.email)||' '
||tuv_customer_mast.name_local||' '||tuv_customer_mast.address_local||' 
'||tuv_customer_mast.building_local||' '
||tuv_customer_mast.city_local||' 
'||COALESCE(tuv_customer_mast_intl.title,tuv_customer_mast.title)||' '
||COALESCE(tuv_customer_mast_intl.first_name,tuv_customer_mast.first_name)||' '
||COALESCE(tuv_customer_mast_intl.second_name,tuv_customer_mast.second_name)||' 
'
||COALESCE(tuv_customer_mast_intl.third_name,tuv_customer_mast.third_name)||' '
||COALESCE(tuv_customer_mast_intl.fourth_name,tuv_customer_mast.fourth_name)||' 
'
||tuv_location_mast.post_code||' '||tuv_location_mast.phone||' 
'||tuv_location_mast.fax||' '
||COALESCE(tuv_location_mast_intl.title,tuv_location_mast.title)||' '
||COALESCE(tuv_location_mast_intl.first_name,tuv_location_mast.first_name)||' '
||COALESCE(tuv_location_mast_intl.second_name,tuv_location_mast.second_name)||' 
'
||COALESCE(tuv_location_mast_intl.third_name,tuv_location_mast.third_name)||' '
||COALESCE(tuv_location_mast_intl.fourth_name,tuv_location_mast.fourth_name)||' 
'
||COALESCE(tuv_location_mast_intl.street_1,tuv_location_mast.street_1)||' '
||COALESCE(tuv_location_mast_intl.street_2,tuv_location_mast.street_2)||' '
||COALESCE(tuv_location_mast_intl.city_1,tuv_location_mast.city_1)||' '
||COALESCE(tuv_location_mast_intl.city_2,tuv_location_mast.city_2)||' '
||COALESCE(tuv_location_mast_intl.state,tuv_location_mast.state)||' 
'||COALESCE(tuv_location_mast_intl.country,tuv_location_mast.country)
as search_data
from
tuv_tuvdotcom_mast,
tuv_tuvdotcom_intl,
tuv_tuvdotcom_type_mast,
tuv_tuvdotcom_system_certs,
tuv_certificate_mast,
tuv_customer_location, 
tuv_location_mast,
tuv_location_mast_intl,
tuv_customer_mast, 
tuv_customer_mast_intl,
tuv_zart_mast,
tuv_zart_mast_intl
where 
tuv_tuvdotcom_mast.tuvdotcom_id = tuv_tuvdotcom_intl.tuvdotcom_id(+) and
tuv_tuvdotcom_mast.tdc_type_id = tuv_tuvdotcom_type_mast.tdc_type_id and
tuv

Re: [GENERAL] full text indexing

2000-10-03 Thread Poul L. Christiansen

Mitch Vincent wrote:
> 
> > Hi
> >
> > In my PostgreSQL database I have a lot of newspaper articles (size:
> > 100mb now, growing beyond 1gb within few months).
> > I wan't to use full text indexing so that users can search the articles
> > with a keyword and have the results in less than one second.
> >
> > How do I accomplish that?
> 
> It's complicated :-)
> 
> > Does PostgreSQL have this feature?
> 
> Nope.
> 
> > Which 3rd party indexing tools are available that easily interoperate
> > with PostgreSQL?
> 
> There is some code in contrib called FTI (Full Text Index - no doubt).. I
> re-wrote it for my uses but ended up not using because I was doing so many
> sorts and joins, it made a scan (even an index scan) of a table with 3
> million rows in it very slow. However id you were just searching the keyword
> table itself it was very, very fast.
> 
> The FTI trigger in the contrib breaks the words down to 2 letter bits (for
> substring searching) -- mine doesn't, it only indexes whole words without
> duplicates and looks at a list of words not to index (words like a, an, the,
> anything else you want -- I think it has the 300 most used English words in
> there already)
> 
> It's drawback is speed, it does take a few seconds to index on INSERT (and
> UPDATE) -- that's if your text fields are 30ish k (very close to the max PG
> can store)..
> 
> I would say you're going to run into the 32k limit pretty quick with
> newspaper articles -- I index resumes and I've run into it many times (32k
> of text really isn't all that much)..

I split the articles up. When pg 7.1 is out, I will merge them together
again :)

> 
> Anyway, I'll try and get that trigger together that I did and send it to the
> PG guys to see if it's worthy of being added to contrib -- I'll send you a
> copy in private if you'd like. Note: I think there have been more people to
> re-write that trigger, I haven't seen anything else though..
> 

Where can i find the FTI code? 

I looked in ftp://ftp.postgresql.org/pub/contrib/ , but no luck :(

Thanks for the reply,
Poul L. Christiansen



Re: [GENERAL] full text indexing

2000-09-28 Thread Martin A. Marques

On Thu, 28 Sep 2000, Gilles DAROLD wrote:
> Hi,
> 
> After taking a look to many way from fast search into full text (flat
> field with regrep, UDMSearch,
> and other stuff like contrib  FTI) i have developped my own indexer and
> fast search.
> It's still specific to my database but can be easily ported to any other
> database structure.
> It also use stop word
> 
> The first load is very slow for big database but I have had incremental
> indexing and then
> following load are speeder.
> 
> Here is the usage of the indexer :
> 
> Usage: ./lincat-indexer.pl [-c -i file -h host -p port] -t table -f field
> -l language -u user -s file -d dbname
> -c=> drop search tables and indexes, reset all indexing
> -i file   => do incremental index with file
> -v=> print a dot for each word processed
> -h host   => database hostname [default: localhost]
> -p port   => database service port [default: 5432]
> -t table  => table to index
> -f field  => field to index
> -l fr => language to use (can be fr, en, ...)
> -u user   => postgres user who need read grant on search tables
> -s file   => file with a list of all words not to be indexed
> -d dbname => name of the database
> -q=> quiet mode, do not print anything else than error
> message
> 
> I also have a search script that can be use to test the index and from you
> can cut and past
> some code to add to your programme.
> 
> usage is ./lincat-search.pl "essai test".
> 
> I don't remember exactly what it does with search (AND/OR) but this depend
> on what you want
> as result, so hack it.
> 
> Of course it's Perl... Search is very fast !
> And in any way you can easily hack it to match your needs.
> 
> Let me know if your are interested I will post it to the list.

I'm interested. I was starting to write a script that would store in another
table the information of certain words with the location, so when I look, I
do it on the table indexed and not with regex in a blob of TEXT.

Anyway I would like to see what you have.

-- 
"And I'm happy, because you make me feel good, about me." - Melvin Udall
-
Martín Marqués  email:  [EMAIL PROTECTED]
Santa Fe - Argentinahttp://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-



Re: [GENERAL] full text indexing

2000-09-28 Thread Gilles DAROLD

Hi,

After taking a look to many way from fast search into full text (flat
field with regrep, UDMSearch,
and other stuff like contrib  FTI) i have developped my own indexer and
fast search.
It's still specific to my database but can be easily ported to any other
database structure.
It also use stop word

The first load is very slow for big database but I have had incremental
indexing and then
following load are speeder.

Here is the usage of the indexer :

Usage: ./lincat-indexer.pl [-c -i file -h host -p port] -t table -f field
-l language -u user -s file -d dbname
-c=> drop search tables and indexes, reset all indexing
-i file   => do incremental index with file
-v=> print a dot for each word processed
-h host   => database hostname [default: localhost]
-p port   => database service port [default: 5432]
-t table  => table to index
-f field  => field to index
-l fr => language to use (can be fr, en, ...)
-u user   => postgres user who need read grant on search tables
-s file   => file with a list of all words not to be indexed
-d dbname => name of the database
-q=> quiet mode, do not print anything else than error
message

I also have a search script that can be use to test the index and from you
can cut and past
some code to add to your programme.

usage is ./lincat-search.pl "essai test".

I don't remember exactly what it does with search (AND/OR) but this depend
on what you want
as result, so hack it.

Of course it's Perl... Search is very fast !
And in any way you can easily hack it to match your needs.

Let me know if your are interested I will post it to the list.

Regards,

Gilles DAROLD

"Poul L. Christiansen" wrote:

> Hi
>
> In my PostgreSQL database I have a lot of newspaper articles (size:
> 100mb now, growing beyond 1gb within few months).
> I wan't to use full text indexing so that users can search the articles
> with a keyword and have the results in less than one second.
>
> How do I accomplish that?
>
> Does PostgreSQL have this feature?
>
> Which 3rd party indexing tools are available that easily interoperate
> with PostgreSQL?
>
> Thanks,
> Poul L. Christiansen
> Dynamic Paper




Re: [GENERAL] full-text indexing

2000-04-18 Thread Brett W. McCoy

On Tue, 18 Apr 2000, Bruce Momjian wrote:

> Let me be specific.  The problem is that without cluster, your fragment
> rows are together in the index, but are all over the heap table, so you
> have to read in all those disk buffers, and that is slow.  With cluster,
> most of your matching fragments are on the same disk page, so one access
> gets them all.
> 
> The nightly CLUSTER is a pain, but the only way I ever got it working
> quickly.

This is pretty painless compared to what I've had to do getting big
full-text systems like Excalibur to work.  I can deal with a nightly
cluster.

Brett W. McCoy
  http://www.chapelperilous.net
---
Twenty two thousand days.
Twenty two thousand days.
It's not a lot.
It's all you've got.
Twenty two thousand days.
-- Moody Blues, "Twenty Two Thousand Days"




Re: [GENERAL] full-text indexing

2000-04-18 Thread Bruce Momjian

> On Tue, 18 Apr 2000, Bruce Momjian wrote:
> 
> > > I agree!  The last bit of advice given in the full text README.  As I
> > > said, I'd built full-text stuff for experimentation (I had maybe 30k of
> > > raw text, which amounted to several 100,000 indexed entries), and I had
> > > clustered it, and it was pretty darn fast, even on a Pentium 233 with only
> > > 48 megs of RAM.  I have significantly better hardware to run it on now.
> > > The original project called MySQL, but it just didn't have what we needed
> > > to put something like this together.
> > 
> > With the original author, testing was fast, but when he loaded all the
> > data, it got very slow.  The problem was that as soon as his data
> > exceeded the buffer cache, performance became terrible.
> 
> How much data are we talking here?  How can one get around this buffer
> cache problem?
> 

You would have to fit _all_ your heap data into the PostgreSQL buffer
cache.  That is a lot of shared memory.  If it was that small, you
wouldn't need full-text indexing.  :-)

-- 
  Bruce Momjian|  http://www.op.net/~candle
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [GENERAL] full-text indexing

2000-04-18 Thread Bruce Momjian

> On Tue, 18 Apr 2000, Bruce Momjian wrote:
> 
> > I have one word for you:  CLUSTER.  Without it, index lookups are too
> > slow.  With it, they are rapid.  I have done some work like this
> > commerically with Ingres, which has an ISAM type that keeps the matching
> > rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
> > regular CLUSTER will keep you good.
> 
> I agree!  The last bit of advice given in the full text README.  As I
> said, I'd built full-text stuff for experimentation (I had maybe 30k of
> raw text, which amounted to several 100,000 indexed entries), and I had
> clustered it, and it was pretty darn fast, even on a Pentium 233 with only
> 48 megs of RAM.  I have significantly better hardware to run it on now.
> The original project called MySQL, but it just didn't have what we needed
> to put something like this together.

With the original author, testing was fast, but when he loaded all the
data, it got very slow.  The problem was that as soon as his data
exceeded the buffer cache, performance became terrible.

-- 
  Bruce Momjian|  http://www.op.net/~candle
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [GENERAL] full-text indexing

2000-04-18 Thread Brett W. McCoy

On Tue, 18 Apr 2000, Bruce Momjian wrote:

> > I agree!  The last bit of advice given in the full text README.  As I
> > said, I'd built full-text stuff for experimentation (I had maybe 30k of
> > raw text, which amounted to several 100,000 indexed entries), and I had
> > clustered it, and it was pretty darn fast, even on a Pentium 233 with only
> > 48 megs of RAM.  I have significantly better hardware to run it on now.
> > The original project called MySQL, but it just didn't have what we needed
> > to put something like this together.
> 
> With the original author, testing was fast, but when he loaded all the
> data, it got very slow.  The problem was that as soon as his data
> exceeded the buffer cache, performance became terrible.

How much data are we talking here?  How can one get around this buffer
cache problem?

Brett W. McCoy
  http://www.chapelperilous.net
---
Twenty two thousand days.
Twenty two thousand days.
It's not a lot.
It's all you've got.
Twenty two thousand days.
-- Moody Blues, "Twenty Two Thousand Days"




Re: [GENERAL] full-text indexing

2000-04-18 Thread Brett W. McCoy

On Tue, 18 Apr 2000, Bruce Momjian wrote:

> I have one word for you:  CLUSTER.  Without it, index lookups are too
> slow.  With it, they are rapid.  I have done some work like this
> commerically with Ingres, which has an ISAM type that keeps the matching
> rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
> regular CLUSTER will keep you good.

I agree!  The last bit of advice given in the full text README.  As I
said, I'd built full-text stuff for experimentation (I had maybe 30k of
raw text, which amounted to several 100,000 indexed entries), and I had
clustered it, and it was pretty darn fast, even on a Pentium 233 with only
48 megs of RAM.  I have significantly better hardware to run it on now.
The original project called MySQL, but it just didn't have what we needed
to put something like this together.

> If you find it slow, let me know.  I have done some benchmarking with
> the author and he found it pretty fast, usually a few seconds.  See the
> section in my book on CLUSTER for information on _why_ it helps.

Thanks, Bruce.

Brett W. McCoy
  http://www.chapelperilous.net
---
Twenty two thousand days.
Twenty two thousand days.
It's not a lot.
It's all you've got.
Twenty two thousand days.
-- Moody Blues, "Twenty Two Thousand Days"




Re: [GENERAL] full-text indexing

2000-04-18 Thread Bruce Momjian

> At 08:40 PM 18/04/00 -0400, Bruce Momjian wrote:
> 
> >I have one word for you:  CLUSTER.  Without it, index lookups are too
> >slow.  With it, they are rapid.  I have done some work like this
> >commerically with Ingres, which has an ISAM type that keeps the matching
> >rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
> >regular CLUSTER will keep you good.
> 
> The only down side of CLUSTER is that you loose your other keys on the
> table, including a primary key, so you loose out on your intregrity checks, 
> which you the have to explictily re-apply with CREATE UNQIUE INDEX ...
> 

Yes, you basically have to recreate them after CLUSTER.

-- 
  Bruce Momjian|  http://www.op.net/~candle
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [GENERAL] full-text indexing

2000-04-18 Thread Jim Richards

At 08:40 PM 18/04/00 -0400, Bruce Momjian wrote:

>I have one word for you:  CLUSTER.  Without it, index lookups are too
>slow.  With it, they are rapid.  I have done some work like this
>commerically with Ingres, which has an ISAM type that keeps the matching
>rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
>regular CLUSTER will keep you good.

The only down side of CLUSTER is that you loose your other keys on the
table, including a primary key, so you loose out on your intregrity checks, 
which you the have to explictily re-apply with CREATE UNQIUE INDEX ...


--
Kumera - a new Open Source Content Management System
for small to medium web sites written in Perl and using XML
http://www.cyber4.org/kumera/index.html



Re: [GENERAL] full-text indexing

2000-04-18 Thread Bruce Momjian

> Does anyone have any experience using Postgres's full-text indexing in a
> production environment?  We're thinking about using it for a project (the
> other solution is to regexp it with Perl...).  I've set up the stuff
> before for experimentation, but am mainly curious about it's performance
> on a live, fairly heavily trafficked server.

I have one word for you:  CLUSTER.  Without it, index lookups are too
slow.  With it, they are rapid.  I have done some work like this
commerically with Ingres, which has an ISAM type that keeps the matching
rows pretty close on a newly-created ISAM index.  In PostgreSQL, and
regular CLUSTER will keep you good.

If you find it slow, let me know.  I have done some benchmarking with
the author and he found it pretty fast, usually a few seconds.  See the
section in my book on CLUSTER for information on _why_ it helps.

http://www.postgresql.org/docs/awbook.html


-- 
  Bruce Momjian|  http://www.op.net/~candle
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026