from:"Teodor Sigaev"

Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-07-19 Thread Teodor Sigaev


CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);
Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=1 width=1261)
(actual time=2157.118..1259550.327 rows=909091 loops=1)
Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
Rows Removed by Index Recheck: 4360296
Heap Blocks: exact=37031 lossy=872059

Hmm, looks like too small work_mem because lossy heap block count  is too big.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Test CMake build

2016-02-12 Thread Teodor Sigaev


I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7

( ./configure && gmake all; )  168,99s user 15,46s system 97% cpu 3:09,61 total
( cmake . && gmake all; )  75,11s user 11,34s system 100% cpu 1:26,30 total

Cmake 2 times faster, that is good, but I don't understand why. Which 
optimization level does cmake buld use by default? Which compiler does it take? 
It's not obvious, because cmake build hides actual compiler command line.


Yury, pls, return back check target...
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Test CMake build

2016-02-12 Thread Teodor Sigaev




Teodor Sigaev wrote:

I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7

( ./configure && gmake all; )  168,99s user 15,46s system 97% cpu 3:09,61 total
( cmake . && gmake all; )  75,11s user 11,34s system 100% cpu 1:26,30 total
( CFLAGS='-O2' cmake . && gmake all; )  141,87s user 12,18s system 97% cpu 
2:37,40 total


Oops, cmake default target is compiled with -O0. With -O2 cmake is still faster 
but not so much.




Cmake 2 times faster, that is good, but I don't understand why. Which
optimization level does cmake buld use by default? Which compiler does it take?
It's not obvious, because cmake build hides actual compiler command line.

Yury, pls, return back check target...


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Test CMake build

2016-02-12 Thread Teodor Sigaev


Hm, I don't think having the compile/link lines be hidden up is
acceptable.  Many times we need to debug some compile problem, and the
output is mandatory.


+1

Although it could be fixed by
VERBOSE=1 make

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] jsonb value retrieval performance

2015-09-09 Thread Teodor Sigaev


  does it read in the whole jsonb tree structure in memory
and get to v1  or it has some optimization so only get v1 instead
of reading in the whole structure.


it reads, untoasts and uncompresses whole value and then executes search. An 
idea to fix that is a reading jsonb value by only needed chunks.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Teodor Sigaev


and I am trying to get value via  jsonb->parentKey->childKey
it seems it is very slow.
Would it be actually faster to use top level key only and parse it at client 
side?


Suppose, most time is spent for decompressing huge value, not for actual search 
inside jsonb. If so, we need to implement some search method which decompress 
some chunks of jsonb.



Could you send to me an example of that jsonb?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Teodor Sigaev


Suppose, most time is spent for decompressing huge value, not for actual search
inside jsonb. If so, we need to implement some search method which decompress
some chunks of jsonb.

On artificial example:
%SAMP IMAGE  FUNCTION CALLERS
 92.9 postgres   pglz_decompress  toast_decompress_datum


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Prefix search on all hstore values

2013-11-28 Thread Teodor Sigaev


Hi!

Full-text search has this feature.

# select to_tsvector('en_name=yes, fr_name=oui'::hstore::text) @@ 'en:*';
 ?column?
--
 t


or (index only keys)

select to_tsvector(akeys('en_name=yes, fr_name=oui'::hstore)::text) @@ 'en:*';
 ?column?
--
 t

To speed up this queries you use functional indexes.


Albert Chern wrote:

Hi,

I have an hstore column that stores a string in several arbitrary languages, so
something like this:

en = string in english, zh = string in chinese, fr = string in 
french

Is it possible to construct an index that can be used to determine if a query
string is a prefix of ANY of the values in the hstore?  From reading the
documentation the closest I've gotten is a gin index after converting the values
to an array, but that doesn't seem to work with prefix searching.  Any pointers
would be much appreciated!

Thanks,
Albert


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Prefix search on all hstore values

2013-11-28 Thread Teodor Sigaev


My requirements can be relaxed to full text search, but the problem I had with
that approach is I have strings in Chinese, and postgres doesn't seem to support
it.  Calling to_tsvector() on Chinese characters always returns an empty vector.



Hm, check your locale settings. AFAIK, somebody uses FTS with Chinese language.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-18 Thread Teodor Sigaev


Basically, I started testing prefix matching in FTS and got into
troubles. Self-contained example follows:


Thank you, fixed. The reason was in incorrect optimization of GIN scan: GIN 
reuses scan result for equals key, but comparison of key didn't take into 
account a difference of scan's strategy.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-18 Thread Teodor Sigaev


Great, thank you!
I assume this one goes into 8.4.3, right?
Yeah, or apply patch 
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/gin/ginscan.c?r1=1.25r2=1.26


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-15 Thread Teodor Sigaev


Thank you for the report, will see on this weekend

Vyacheslav Kalinin wrote:

Hello,

Basically, I started testing prefix matching in FTS and got into
troubles. Self-contained example follows:

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Teodor Sigaev


APseudoUtopia apseudouto...@gmail.com writes:

Here's what happened:

$ vacuumdb --all --full --analyze --no-password
vacuumdb: vacuuming database postgres
vacuumdb: vacuuming database web_main
vacuumdb: vacuuming of database web_main failed: ERROR: б═huge tuple



PostgreSQL 8.4.0 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC)
4.2.1 20070719  [FreeBSD], 32-bit
Pls, apply attached patch. Patch increases max size from approximately 500 bytes 
up to 2700 bytes, so vacuum will be able to finish.




This is evidently coming out of ginHeapTupleFastCollect because it's
formed a GIN tuple that is too large (either too long a word, or too
many postings, or both).  I'd say that this represents a serious
degradation in usability from pre-8.4 releases: before, you would have
gotten the error upon attempting to insert the table row that triggers
the problem.  Now, with the fast insert stuff, you don't find out
until VACUUM fails, and you have no idea where the bad data is.  Not cool.

Oleg, Teodor, what can we do about this?  Can we split an oversize
tuple into multiple entries?  Can we apply suitable size checks
before instead of after the fast-insert queue?
ginHeapTupleFastCollect and ginEntryInsert checked tuple's size for 
TOAST_INDEX_TARGET, but ginHeapTupleFastCollect checks without one ItemPointer, 
as ginEntryInsert does it. So ginHeapTupleFastCollect could produce a tuple 
which 6-bytes larger than allowed by ginEntryInsert. ginEntryInsert is called 
during pending list cleanup.


Patch removes checking of TOAST_INDEX_TARGET and use checking only by 
GinMaxItemSize which is greater than TOAST_INDEX_TARGET. All size's check is now 
in GinFormTuple.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Teodor Sigaev


Looks reasonable, although since the error is potentially user-facing
I think we should put a bit more effort into the error message
(use ereport and make it mention the index name, at least --- is there
any other useful information we could give?)

Only sizes as it's done in BTree, I suppose.


Will you apply this, or do you want me to?


I'm not able to provide a good error message in good English :(


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] FILLFACTOR for GIN indexes in 8.3.7

2009-08-14 Thread Teodor Sigaev

it seems that I should reduce the Fill Factor of some FTS indexes, but 
what is the default ?
  The other index methods use fillfactor in different but roughly 
analogous ways;

  the default fillfactor varies between methods


Actually, GIN doesn't use it.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text index not being used

2009-02-01 Thread Teodor Sigaev





I tried to create an index including all of the fields I query on to
see if that would work, but I get an error the the index row is too
large:

= create index master_index on source_listings(geo_lat, geo_lon,
price, bedrooms, region, city, listing_type, to_tsvector('english',
full_listing), post_time);
It's not a fulltext index - btree doesn't support @@ operation. Read 
carefully: http://www.postgresql.org/docs/8.3/static/textsearch.html , 
and about full text indexes: 
http://www.postgresql.org/docs/8.3/static/textsearch-tables.html , 
http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev


Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.

Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text 
search, I get a segfault. My Postgres version is 8.3.5



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev


How do I make a backtrace?


- if you have coredump, just execute gdb /PATH1/postgres gdb /PATH2/core and 
type bt. Linux doesn't make core by default, so you allow to do it by ulimit -c 
unlimited for postgresql user
- connect to db, and attach gdb to backend process: gdb /PATH1/postgres 
BACKEND_PID and type run in gdb, next, execute CREATE DICTIONARY and type bt in gdb




Teodor Sigaev wrote:

Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.

Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text 
search, I get a segfault. My Postgres version is 8.3.5








--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev

I reproduced the bug with a help of Grzegorz's point for 64-bit box. So, patch 
is attached and I'm going to commit it


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
*** src/backend/tsearch/spell.c.orig2009-01-29 18:18:03.0 +0300
--- src/backend/tsearch/spell.c 2009-01-29 18:20:09.0 +0300
***
*** 521,527 
(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg(multibyte flag character is not 
allowed)));
  
!   Conf-flagval[(unsigned int) *s] = (unsigned char) val;
Conf-usecompound = true;
  }
  
--- 521,527 
(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg(multibyte flag character is not 
allowed)));
  
!   Conf-flagval[*(unsigned char*) s] = (unsigned char) val;
Conf-usecompound = true;
  }
  
***
*** 654,660 
ptr = repl + (ptr - prepl) + 1;
while (*ptr)
{
!   aflg |= Conf-flagval[(unsigned int) 
*ptr];
ptr++;
}
}
--- 654,660 
ptr = repl + (ptr - prepl) + 1;
while (*ptr)
{
!   aflg |= Conf-flagval[*(unsigned char*) 
ptr];
ptr++;
}
}
***
*** 735,741 
  
if (*s  pg_mblen(s) == 1)
{
!   Conf-flagval[(unsigned int) *s] = 
FF_COMPOUNDFLAG;
Conf-usecompound = true;
}
oldformat = true;
--- 735,741 
  
if (*s  pg_mblen(s) == 1)
{
!   Conf-flagval[*(unsigned char*) s] = 
FF_COMPOUNDFLAG;
Conf-usecompound = true;
}
oldformat = true;
***
*** 791,797 

(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg(multibyte flag 
character is not allowed)));
  
!   flag = (unsigned char) *s;
goto nextline;
}
if (STRNCMP(recoded, COMPOUNDFLAG) == 0 || STRNCMP(recoded, 
COMPOUNDMIN) == 0 ||
--- 791,797 

(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg(multibyte flag 
character is not allowed)));
  
!   flag = *(unsigned char*) s;
goto nextline;
}
if (STRNCMP(recoded, COMPOUNDFLAG) == 0 || STRNCMP(recoded, 
COMPOUNDMIN) == 0 ||
***
*** 851,857 
  
while (str  *str)
{
!   flag |= Conf-flagval[(unsigned int) *str];
str++;
}
  
--- 851,857 
  
while (str  *str)
{
!   flag |= Conf-flagval[*(unsigned char*) str];
str++;
}
  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev





Than I have quite few notes about that function:
- affix is not checked on entry, and should be unsigned,


Could be Assert( affix=0  affix  Conf-nAffixData )


- for sake of safety uint32_t should be used instead of unsigned int,
in the cast

see patch

- there should be some safety limit for lenght of str,

It's a C-string

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev




Tom Lane wrote:

Teodor Sigaev teo...@sigaev.ru writes:

I reproduced the bug with a help of Grzegorz's point for 64-bit box.


Hmm, seems it's not so much a 64 bit error as a signed vs unsigned
char issue?  


Yes, but I don't understand why it worked in 32-bit box.


Does this affect the old contrib/tsearch2 code?


Will check.


Please try to make the commits in the next eight hours, as we have
release wraps scheduled for tonight.


Minor versions or beta of 8.4? if last, I'd like to commit btree_gin and 
fast_update_gin.  For both patches all pointed issues was resolved and Jeff, 
seems, haven't objections.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev


To be honest, looking through that file, I am quite worried about few
points. I don't know too much about insights of ispell, but I see few
suspicious things in mkSPNode too.
I generally don't want to get involve in reviewing code for stuff I
don't know, But if Teodor (and Oleg) don't mind, I can raise my
points, and see if anything useful comes out of it.

If you see bug/mistake/suspicious point, please, don't be quiet



Also, about that patch - it doesn't seem to apply cleanly to 8.4,
perhaps that file has changed too much (I based my 'review' above on
8.4's code)

will tweak
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev


char issue?  Does this affect the old contrib/tsearch2 code?


Checked - No, that was improvement for 8.3 :).

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] very long update gin index troubles back?

2009-01-27 Thread Teodor Sigaev


No matter if I drop the trigger that update agg content and the fact
that I'm just updating d, postgresql will update the index?
Yes, due to MVCC. Update of row could produce new version (tuple) and new 
version should be index as old one.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] very long update gin index troubles back?

2009-01-24 Thread Teodor Sigaev


Ответишь ему что-нибудь? Он так мутно излагает, что я ни хрена не понял.

Ivan Sergio Borgonovo wrote:

I've a table that contain a tsvector that is indexed (gin) and
triggers to update the tsvector that should then update the index.

This gin index has always been problematic. Recreation and updates
were very slow.

Now I had to update 1M rows of that table but for columns
that doesn't involve the tsvector
I dropped the trigger to update the tsvector so that when rows get
updated the trigger won't be called so things should be faster...
but still it is taking forever.

begin;
set constraints all deferred;

select * from FT1IDX_trigger_drop();
update catalog_items set
APrice=p.PrezzoA,
BPrice=p.PrezzoB
from import.catalog_prices p where
catalog_items.ItemID=p.id;
select * from FT1IDX_trigger_create();
commit;

function are used since I've 2 triggers actually that I drop and
create.

Is there anything wrong in the above to make this update so slow on
a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] very long update gin index troubles back?

2009-01-24 Thread Teodor Sigaev

GIN index is slow for update by its construction. When you update the rows with 
or without columns indexed by GIN, postgres (in most cases) will insert new 
records, so index insertion will occur. So, for large updates it's much cheaper 
to drop and create index.


That was a one of reasons to develop fast_insert_gin patch which now in review 
process.


Ivan Sergio Borgonovo wrote:

I've a table that contain a tsvector that is indexed (gin) and
triggers to update the tsvector that should then update the index.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [TextSearch] syntax error while parsing affix file

2008-12-02 Thread Teodor Sigaev


iconv -f windows-1251 -t utf-8 bulgarian.dic bulgarian_utf8.dict
iconv -f windows-1251 -t utf-8 bulgarian.aff bulgarian_utf8.affix

The locale of the database is fr_FR, and its encoding is UTF8.
 I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones are not 
acceptable for french locale  :(



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [TextSearch] syntax error while parsing affix file

2008-12-02 Thread Teodor Sigaev

I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell 
dictionary (the OpenOffice one) for Textsearch features.



flag *A:
.  А (this is line 24)
.  АТА
.  И
.  ИТЕ

OpenOffice or ISpell? Pls, provide:
- link to download of dictionary
- Locale and encoding setting of your db

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to reduce impact of a query.

2008-11-17 Thread Teodor Sigaev

The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non 
standard parts of my postgresql.conf are as follows:

max_connections=100
shared_buffers=128MB
work_mem=4MB
maintenance_work_mem=256MB
max_fsm_pages=204800
max_fsm_relations=1500

Any tips appreciated.


Pls, show
1) effective_cache_size
2) The query
3) Output of EXPLAIN ANALYZE of query

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] still gin index creation takes forever

2008-11-13 Thread Teodor Sigaev


Yeah, I'm not convinced either.  Still, Teodor's theory should be easily
testable: set synchronize_seqscans to FALSE and see if the problem goes
away.


Test suit to reproduce the problem:
DROP TABLE IF EXISTS foo;
DROP TABLE IF EXISTS footmp;

CREATE OR REPLACE FUNCTION gen_array()
RETURNS _int4 AS
$$
SELECT ARRAY(
SELECT (random()*1000)::int
FROM generate_series(1,10+(random()*90)::int)
)
$$
LANGUAGE SQL VOLATILE;

SELECT gen_array() AS v INTO foo FROM generate_series(1,10);

VACUUM ANALYZE foo;

CREATE INDEX fooidx ON foo USING gin (v);
DROP INDEX fooidx;

SELECT * INTO footmp FROM foo LIMIT 9;

CREATE INDEX fooidx ON foo USING gin (v);
DROP INDEX fooidx;

On my notebook with HEAD and default postgresql.conf it produce (show only 
interesting part):


   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   Time: 14961,409 ms
   postgres=# SELECT * INTO footmp FROM foo LIMIT 9;
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   LOG:  checkpoints are occurring too frequently (12 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   LOG:  checkpoints are occurring too frequently (8 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   LOG:  checkpoints are occurring too frequently (7 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   LOG:  checkpoints are occurring too frequently (10 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   LOG:  checkpoints are occurring too frequently (8 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   CREATE INDEX
   Time: 56286,507 ms

So, time for creation is 4-time bigger after select.
Without SELECT * INTO footmp FROM foo LIMIT 9;:
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   CREATE INDEX
   Time: 13894,050 ms
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   LOG:  checkpoints are occurring too frequently (14 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   CREATE INDEX
   Time: 15087,348 ms

Near to the same time.

With   synchronize_seqscans = off and SELECT:
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   CREATE INDEX
   Time: 14452,024 ms
   postgres=# SELECT * INTO footmp FROM foo LIMIT 9;
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   LOG:  checkpoints are occurring too frequently (16 seconds apart)
   HINT:  Consider increasing the configuration parameter checkpoint_segments.
   CREATE INDEX
   Time: 14557,750 ms

Again, near to the same time.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] still gin index creation takes forever

2008-11-13 Thread Teodor Sigaev


We could extend IndexBuildHeapScan's API to support that, but I'm
not quite convinced that this is the issue.


That extension might be useful for bitmap index too to simplify index creation 
process.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] still gin index creation takes forever

2008-11-13 Thread Teodor Sigaev


changing it; I've applied a patch for that.  I'm still not quite
convinced that Ivan isn't seeing some other issue though.


Thank you


In the meantime, I noticed something odd while experimenting with your
test case: when running with default maintenance_work_mem = 16MB,
there is a slowdown of 3x or 4x for the un-ordered case, just as you
say.  But at maintenance_work_mem = 200MB I see very little difference.
This doesn't make sense to me --- it seems like a larger workspace
should result in more difference because of greater chance to dump a
lot of tuples into the index at once.  Do you know why that's happening?


I suppose, if maintenance_work_mem is rather big then all data of index 
accumulates in memory and so it writes at disk at once. With that test's options 
size of index is equal to 40Mb.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] still gin index creation takes forever

2008-11-12 Thread Teodor Sigaev

GIN's build algorithm could use bulk insert of ItemPointers if and only if they 
should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in 
gindatapage.c, lines 407-427)


I'm not following.  Rightmost page of what --- it can't be the whole
index, can it, or the case would hardly ever apply?


GIN's index contains btree over keys (entry tree) and for each key it 
contains list of ItemPointers (posting list) or btree over ItemPointers 
(posting tree or data tree) depending on its quantity. Bulk insertion 
process collects into memory keys and sorted arrays of ItemPointers, and 
then for each keys, it tries to insert every ItemPointer from array into 
corresponding data tree one by one. But if the smallest ItemPointer in 
array is greater than the biggest stored one then algorithm will insert 
the whole array on rightmost page in data tree.


So, in that case process can insert about 1000 ItemPointers per one data 
tree lookup, in opposite case it does 1000 lookups in data tree.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] still gin index creation takes forever

2008-11-12 Thread Teodor Sigaev


Any suggestion about how to track down the problem?


What you are describing sounds rather like a use-of-uninitialized-memory
problem, wherein the behavior depends on what happened to be in that
memory previously.  If so, using a debug/cassert-enabled build of
Postgres might help to make the behavior more reproducible.


It seems to me, possible reason of that behavior could be an order of table's 
scanning. GIN's build algorithm prefers scan from begin to the end of table, but 
in 8.3 it's not always true - scan may begin from the middle or end of table 
depending on sequence scan's history.


GIN's build algorithm could use bulk insert of ItemPointers if and only if they 
should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in 
gindatapage.c, lines 407-427)


Is any way to force table's scan from the beginning?
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Weird problem concerning tsearch functions built into postgres 8.3, assistance requested

2008-10-30 Thread Teodor Sigaev

One of the tables we're using in the 8.1.3 setups currently running 
includes phone numbers as a searchable field (fti_phone), with the 
results of a select on the field generally looking like this: 'MMM':2 
'':3 'MMM-':1.  MMM is the first three digits,  is the 
fourth-seventh.


The weird part is this: On the old systems running 8.1.3, I can look up 
a record by
fti_phone using any of the three above items; first three, last four, or 
entire number including dash.  On the new system running 8.3.1, I can do 
a lookup by the first three or the last four and get the results I'm 
after, but any attempt to do a lookup by the entire MMM- version 
returns no records.


Parser was changed:
postgres=# select  * from ts_debug('123-4567');
 alias |   description| token | dictionaries | dictionary | lexemes
---+--+---+--++-
 uint  | Unsigned integer | 123   | {simple} | simple | {123}
 int   | Signed integer   | -4567 | {simple} | simple | {-4567}
(2 rows)
postgres=# select  * from ts_debug('abc-defj');
  alias  |   description   |  token   |  dictionaries 
|  dictionary  |  lexemes

-+-+--++--+
 asciihword  | Hyphenated word, all ASCII  | abc-defj | {english_stem} 
| english_stem | {abc-defj}
 hword_asciipart | Hyphenated word part, all ASCII | abc  | {english_stem} 
| english_stem | {abc}
 blank   | Space symbols   | -| {} 
|  |
 hword_asciipart | Hyphenated word part, all ASCII | defj | {english_stem} 
| english_stem | {defj}


Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats 
[digit]+-[digit]+ as two separated numbers.


So, you can play around pre-process texts before indexing or have a look on
regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html)
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-22 Thread Teodor Sigaev


Fixed, patch attached.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
diff -c -r src.orig/backend/access/gist/gistget.c 
src/backend/access/gist/gistget.c
*** src.orig/backend/access/gist/gistget.c  2008-10-22 12:07:39.0 
+0400
--- src/backend/access/gist/gistget.c   2008-10-22 15:13:23.0 +0400
***
*** 49,55 
  
for (offset = FirstOffsetNumber; offset = maxoff; offset = 
OffsetNumberNext(offset))
{
!   IndexTuple  ituple = (IndexTuple) 
PageGetItem(p, PageGetItemId(p, offset));
  
if (ItemPointerEquals((ituple-t_tid), iptr))
{
--- 49,55 
  
for (offset = FirstOffsetNumber; offset = maxoff; offset = 
OffsetNumberNext(offset))
{
!   IndexTuple  ituple = (IndexTuple) PageGetItem(p, 
PageGetItemId(p, offset));
  
if (ItemPointerEquals((ituple-t_tid), iptr))
{
***
*** 157,163 
{
while( ntids  maxtids  so-curPageData  so-nPageData )
{
!   tids[ ntids ] = scan-xs_ctup.t_self = so-pageData[ 
so-curPageData ];

so-curPageData ++;
ntids++;
--- 157,167 
{
while( ntids  maxtids  so-curPageData  so-nPageData )
{
!   tids[ ntids ] = scan-xs_ctup.t_self = so-pageData[ 
so-curPageData ].heapPtr;
!   ItemPointerSet((so-curpos),
!  
BufferGetBlockNumber(so-curbuf), 
!  so-pageData[ 
so-curPageData ].pageOffset);
! 

so-curPageData ++;
ntids++;
***
*** 251,258 
{
while( ntids  maxtids  so-curPageData  
so-nPageData )
{
!   tids[ ntids ] = scan-xs_ctup.t_self = 
so-pageData[ so-curPageData ];

so-curPageData ++;
ntids++;
}
--- 255,267 
{
while( ntids  maxtids  so-curPageData  
so-nPageData )
{
!   tids[ ntids ] = scan-xs_ctup.t_self = 
!   so-pageData[ so-curPageData 
].heapPtr;

+   ItemPointerSet((so-curpos),
+  
BufferGetBlockNumber(so-curbuf), 
+  
so-pageData[ so-curPageData ].pageOffset);
+ 
so-curPageData ++;
ntids++;
}
***
*** 297,309 
 * we can efficiently resume the index scan 
later.
 */
  
-   ItemPointerSet((so-curpos),
-  
BufferGetBlockNumber(so-curbuf), n);
- 
if (!(ignore_killed_tuples  
ItemIdIsDead(PageGetItemId(p, n
{
it = (IndexTuple) PageGetItem(p, 
PageGetItemId(p, n));
!   so-pageData[ so-nPageData ] = 
it-t_tid;
so-nPageData ++;
}
}
--- 306,316 
 * we can efficiently resume the index scan 
later.
 */
  
if (!(ignore_killed_tuples  
ItemIdIsDead(PageGetItemId(p, n
{
it = (IndexTuple) PageGetItem(p, 
PageGetItemId(p, n));
!   so-pageData[ so-nPageData ].heapPtr = 
it-t_tid;
!   so-pageData[ so-nPageData 
].pageOffset = n;
so-nPageData ++;
}
}
diff -c -r src.orig/backend/access/gist/gistscan.c 
src/backend/access/gist/gistscan.c
*** src.orig/backend/access/gist/gistscan.c 2008-10-22 12:07:39.0 
+0400
--- src/backend/access/gist/gistscan.c  2008-10-22 14:55:58.0 +0400
***
*** 163,169 
so

Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-22 Thread Teodor Sigaev


20 hours to find the fix Teodor, Kudos !

Nothing for the pride :(, my bug.


Due to the importance of the fix, will we see very soon a 8.3.5 ?

Don't known, see discussion. I think, that will make sense.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-21 Thread Teodor Sigaev


Thank you, I reproduce the bug, will fix.

Sergey Konoplev wrote:

Ok, I've done the test case (see attachment).

8.3.3 has passed it.
8.3.4 hasn't passed in ~99% times I run it.

Steps to reproduce:
1. install pg 8.3.4, do initdb, start pg
2. correct PSQL parameter in pg-8.3.4_index_update_test.sh
3. run pg-8.3.4_index_update_test.sh few times

And you will see something like this:

...

--
2nd time obtaining seq-scan count and plan...
--
SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
 table1_flag | count
-+---
  1 |   100
(1 row)

EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
   QUERY PLAN
---
 HashAggregate  (cost=15.00..15.01 rows=1 width=2) (actual
time=0.140..0.140 rows=1 loops=1)
  -  Seq Scan on table1  (cost=0.00..12.00 rows=600 width=2) (actual
time=0.004..0.059 rows=100 loops=1)
 Total runtime: 0.172 ms
(3 rows)

--
2nd time obtaining index-scan count and plan...
--
SELECT count(*) FROM table1
WHERE table1_flag = 1;
 count
---
   98
(1 row)

EXPLAIN ANALYZE SELECT count(*) FROM table1
WHERE table1_flag = 1;
 QUERY
PLAN
--
 Aggregate  (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451
rows=1 loops=1)
  -  Index Scan using i_table1__table1_point on table1
(cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98
loops=1)
 Total runtime: 0.477 ms
(3 rows)

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english  russian
http://gray-hemp.blogspot.com/search/label/postgresql/


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-20 Thread Teodor Sigaev


Hmm.  So the problem seems to be statable as a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation.  Teodor, can you think of anything you
changed recently in that area?


Only fixing possible duplicates during index scan. Will see.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-20 Thread Teodor Sigaev


Hmm.  So the problem seems to be statable as a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation.  Teodor, can you think of anything you
changed recently in that area?


I still can't reproduce the bug, but found useless recheck condition with bitmap 
check:


drop table if exists qq;
select 1 as st , 1::int4 as t into qq from generate_series(1,1) as t;
create index qqidx on qq using gist (st) where t =1;
INSERT INTO qq (SELECT (4 * random())::int4, (4 * random())::int4 from 
generate_series(1,1));


# explain select t, count(1) from qq where t =1 group by t;
 QUERY PLAN
-
 GroupAggregate  (cost=19.62..633.49 rows=1 width=2)
   -  Bitmap Heap Scan on qq  (cost=19.62..630.28 rows=640 width=2)
 Recheck Cond: (t = 1)
 -  Bitmap Index Scan on qqidx  (cost=0.00..19.46 rows=640 width=0)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Using ISpell dictionary - headaches...

2008-07-22 Thread Teodor Sigaev



It *may* be because I'm using psql 8.0.3 and not the latest version (but 
I'm stucked with that version), i'm just hoping that one of you have met 


Upgrade to 8.0.17 - there was a several fixes in ISpell code.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] changing text search treatment of puncutation

2008-07-03 Thread Teodor Sigaev




In general there seem to be a lot of ways that people wish they
could tweak the text search parser, and telling them to write
their own parser isn't a very helpful response for most folk.
I don't have an idea about how to improve the situation, but
it seems like something that should be thought about.


We (with Oleg) thought hard about it and we don't find a solution yet.
Configurable parser should be:
- fast
- flexible
- not error-prone
- comfortable to use by non-programmer (at least for non-C programmer)

It might be a table-driven state machine (just put TParserStateAction into 
table(s) with some caching for first step) , but it's complex to operate and 
it's needed to prove correctness of changes in states before its become in use.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] multi-word expression full-text searching

2008-07-01 Thread Teodor Sigaev


If I understand well the plainto_tsquery behaviour, this query match with:
Despite this, the president went out.
Despite the event, this question arise.


Right, you mean phrase search. Read the thread: 
http://archives.postgresql.org/pgsql-hackers/2008-05/msg0.php


Suggested patch should be made as module, I think.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] multi-word expression full-text searching

2008-06-30 Thread Teodor Sigaev



SELECT id FROM document WHERE to_tsvector('english',text) @@
plainto_tsquery('english','despite this');
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Fragments in tsearch2 headline

2008-05-23 Thread Teodor Sigaev


[moved to -hackers, because talk is about implementation details]


I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1
(http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php)

Thank you.

1  diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c
now contrib/tsearch2 is compatibility layer for old applications - they don't
know about new features. So, this part isn't needed.

2 solution to compile function (ts_headline_with_fragments)  into core, but
using it only from contrib module looks very odd. So, new feature can be used
only with compatibility layer for old release :)

3 headline_with_fragments() is hardcoded to use default parser, but what will be
in case when configuration uses another parser? For example, for japanese 
language.

4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] )
and function should accept 'NumFragments=N' for default parser. Another parsers
may use another options.

5 it just doesn't work correctly, because new code doesn't care of parser
specific type of lexemes.
contrib_regression=# select headline_with_fragments('english', 'wow asd-wow
wow', 'asd', '');
 headline_with_fragments
--
 ...wow asd-wowbasd/b-wow wow
(1 row)


So, I incline to use existing framework/infrastructure although it may be a
subject to change.

Some description:
1 ts_headline defines a correct parser to use
2 it calls hlparsetext to split text into structure suitable for both goals:
find the best fragment(s) and concatenate that fragment(s) back to the text
representation
3 it calls parser specific method   prsheadline which works with preparsed text
(parse was done in hlparsetext). Method should mark a needed
words/parts/lexemes etc.
4 ts_headline glues fragments into text and returns that.

We need a parser's headline method because only parser knows all about its 
lexemes.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tsearch2 on-demand dictionary loading using functions in tsearch2

2008-05-18 Thread Teodor Sigaev

 * Considering the database is loaded separately for each session, does 
this also imply that each running backend has a separate dictionary 
stored in memory?


Yes.


As for downsides, I only really see two:
 * Tracking updates of dictionaries - but it's reasonable to believe 
that new connections get open more often than the dictionary gets 
updated. Also, this might be easily solved by stat()-ing the dictionary 
file before starting up session, and only have the server reload it if 
there's a notified change.

 * Possibly complicated to implement?


Keeping dictionary up to date - it's a most difficult part here. Configuration 
of dictionary might be done by ALTER command - so, parent process (and all 
currently running backends) should get that information to reload dictionary.



As for my second question, is it possible to use functions in tsearch2? 
For example, writing my own stemmer in PL/pgSQL or in C as a postgres 
function.


Yes, of course, you can develop your dictionary (-ies) and parser. Dut only in 
C, because they are critical for performance.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tsearch2 on-demand dictionary loading using functions in tsearch2

2008-05-18 Thread Teodor Sigaev




Hmm, good point; I presume accept the fact that settings change won't 
propagate to other backends until reconnect would not be acceptable 
behavior, even if documented along with the relevant configuration option?


I suppose so. That was one of the reasons to move tsearch into core and it will 
be too regrettable to lost that feature again.


As for my second question, is it possible to use functions in 
tsearch2? For example, writing my own stemmer in PL/pgSQL or in C as 
a postgres function.
I've had something different in mind. Considering there are already 
facilities to use functions, be it PL/pgSQL, PL/Python or C, why not 
just use those with the condition that the function must accept 
some-arguments and return some-result? Or would using this, even while 
using C as the language used for the actual parser, slow things down too?


API to dictionary and parser intentionally utilizes complex (and nested) 
C-structures to decrease overheads. During parse of text postgres makes two call 
of parser (one call - parser returns word, second - word delimiter. Space is a 
lexeme too! Although it's not a subject to index) and one call of dictionary per 
word. So, if your language can work with C-structures then you can use that 
language with tsearch with more or less performance pay. PL/pgSQL hasn't this 
capability.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Direct access to GIST structure

2008-04-04 Thread Teodor Sigaev




encode spatial proximity. Is there an API (backend C-level is fine) to
access a GIST index?


The best way is to extend existing interface to GiST to support KNN-search. But 
you can see how to get access to index structure from module in gevel module 
(http://www.sigaev.ru/cvsweb/cvsweb.cgi/gevel/). GiST-related functions in this 
module is invented to help to developers, not for production use, so they 
acquire exclusive lock on index.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Direct access to GIST structure

2008-04-04 Thread Teodor Sigaev


I just stumbled on http://www.cs.purdue.edu/spgist/ which seems like
exactly what I need.


It doesn't work with 8.2 and up, because since 8.2 index should take care about 
concurrent access itself and that implementation doesn't do it.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Fragments in tsearch2 headline

2008-03-31 Thread Teodor Sigaev


The patch takes into account the corner case of overlap. Here is the
code for that
// start check
if (!startHL  *currentpos = startpos)
   startHL = 1;

The headline generation will not start until currentpos has gone past
startpos. 

Ok



You can also check how this headline function is working at my website
indiankanoon.com. Some example queries are murder, freedom of speech,
freedom of press etc.

Looks good.


Should I develop the patch for the current cvs head of postgres?


I'd like to commit your patch, but if it should be:
 - for current HEAD
 - as extension of existing ts_headline.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Fragments in tsearch2 headline

2008-03-17 Thread Teodor Sigaev




Teodor, Oleg, do we want this?
http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php


I suppose, we want it. But there are a questions/issues:
- Is it needed to introduce new function? may be it will be better to add option 
to existing headline function. I'd like to keep current layout: ts_headline 
provides some common interface to headline generation. Finding and marking 
fragments is deal of parser's headline method and generation of exact pieces of 
text is made by ts_headline.

- Covers may be overlapped. So, overlapped fragments will be looked odd.


In any case, the patch was developed for contrib version of tsearch.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text index and most frequently used words

2008-02-08 Thread Teodor Sigaev


What I'd like to know is if there is an easy to way to use the full
text index to generate a list of the most common words.  I could write
this code manually, but I'm hoping there's a better (simpler) way.


For 8.3
http://www.postgresql.org/docs/8.3/static/textsearch-features.html#TEXTSEARCH-STATISTICS

For versions before 8.3 just use stat() function instead of ts_stat().
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] postgres 8.3 rc-1 ispell installation problem

2008-01-20 Thread Teodor Sigaev


flag *J:# isimo
   E -E, 'ISIMO  #  grand'isimo  -- here  432
E-E, 'ISIMOS   # grande grand'isimos
E-E, 'ISIMA# grande grand'isima
E-E, 'ISIMAS   # grande grand'isimas
O-O, 'ISIMO# tonto tont'isimo
O-O, 'ISIMA# tonto tont'isima


Current implementation doesn't accept any character in ending except alpha ones.

i think 'I..  word is not correct for ispell, 
this should be one Í letter 

That's right, but you should convert dictionary and affix file in UTF8 encoding.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell

2008-01-16 Thread Teodor Sigaev


Fixes are committed to CVS, hope, they will help you.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell

2008-01-15 Thread Teodor Sigaev


I tryed to reproduce the bug but without success.
Could you provide a dump of text column?


Hannes Dorbath wrote:
Crash happens about 7 minutes after issuing the UPDATE statement with 
current CVS HEAD. The table has around 5 million rows. It's always 
reproducible.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Tsearch2 - spanish

2007-09-19 Thread Teodor Sigaev


prueba1=# select to_tsvector('espanol','melón  perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
! 



Hmm, can you provide backtrace?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Tsearch2 - spanish

2007-09-18 Thread Teodor Sigaev


prueba=# select to_tsvector('espanol','melón');
ERROR:  Affix parse error at 506 line

and

prueba=# select lexize('sp','melón');
 lexize  
-

 {melon}
(1 row)


Looks very strange, can you provide list of dictionaries and configuration map?


I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from latin1 to utf8 and utf8 to
iso88591) and got the same error

where  can I investigate for resolve about this problem?

My dictionary at 506 line had:

Where do you take this file? And what is encdoing/locale setting of your db?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] tsearch2 anomoly?

2007-09-07 Thread Teodor Sigaev

Usual text hasn't strict syntax rules, so parser tries to recognize most 
probable token.  Something with '.', '-' and alnum characters is often a 
filename, but filename is very rare finished or started by dot.


RC Gobeille wrote:

Thanks and I didn't know about ts_debug, so thanks for that also.

For the record, I see how to use my own processing function (e.g.
dropatsymbol) to get what I need:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro
.html

However, can you explain the logic behind the parsing difference if I just
add a .s to a string:


ossdb=# select ts_debug('gallery2-httpd-2.1-conf.');
ts_debug
---
 (default,hword,Hyphenated word,gallery2-httpd-2,{simple},'2' 'httpd'
'gallery2' 'gallery2-httpd-2')
 (default,part_hword,Part of hyphenated word,gallery2,{simple},'gallery2')
 (default,lpart_hword,Latin part of hyphenated
word,httpd,{en_stem},'httpd')
 (default,float,Decimal notation,2.1,{simple},'2.1')
 (default,lpart_hword,Latin part of hyphenated word,conf,{en_stem},'conf')
(5 rows)

ossdb=# select ts_debug('gallery2-httpd-2.1-conf.s');
  ts_debug
-
 (default,host,Host,gallery2-httpd-2.1-conf.s,{simple},'gallery2-httpd-2.1-c
onf.s')
(1 row)

Thanks again,
Bob


On 9/6/07 11:19 AM, Oleg Bartunov [EMAIL PROTECTED] wrote:


This is how default parser works.  See output from
select * from ts_debug('gallery2-httpd-conf');
and
select * from ts_debug('httpd-2.2.3-5.src.rpm');

All token type:

select * from token_type();


On Thu, 6 Sep 2007, RC Gobeille wrote:


I'm having trouble understanding to_tsvector.  (PostreSQL 8.1.9 contrib)

In this first case converting 'gallery2-httpd-conf' makes sense to me and is
exactly what I want.  It looks like the entire string is indexed plus the
substrings broken by '-' are indexed.


ossdb=# select to_tsvector('gallery2-httpd-conf');
 to_tsvector
-
'conf':4 'httpd':3 'gallery2':2 'gallery2-httpd-conf':1


However, I'd expect the same to happen in the httpd example - but it does not
appear to.

ossdb=# select to_tsvector('httpd-2.2.3-5.src.rpm');
  to_tsvector
---
'httpd-2.2.3-5.src.rpm':1

Why don't I get: 'httpd', 'src', 'rpm', 'httpd-2.2.3-5.src.rpm' ?

Is this a bug or design?


Thank you!
Bob

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] PickSplit method of 2 columns ... error

2007-08-28 Thread Teodor Sigaev

Split page algorithm was rewrited for 8.2 for multicolumn indexes and API for 
user-defined pickSplit function was extended to has better results with index 
creation. But GiST can interact with old functions - and it says about this. 
That isn't mean some real problem or error - index will be the same as in 8.1, 
not better.




Kevin Neufeld wrote:

Has anyone come across this error before?

LOG:  PickSplit method of 2 columns of index 
'asset_position_lines_asset_cubespacetime_idx' doesn't support secondary 
split


This is a multi-column GiST index on an integer and a cube (a data type 
from the postgres cube extension module).


I traced the error to the gistUserPicksplit 
gistsplit_8c.html#ae6afe3060066017ec18f7d40d3f9de8 function in the 
gistsplit.c ... I surmise that this method is called whenever a page 
split is necessary.


So, I know when this error occurs, but I don't know why.

Thoughts anyone?
Cheers,
Kevin



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Regression - Query requires full scan, GIN doesn't support it

2007-06-22 Thread Teodor Sigaev


Is this a permanent limitation of GIN, or is a fix possible?
Permanent. You could check user input by querytree() function --- if it returns 
'T' string then fullscan will be needed. If your tsquery is produced by 
plainto_tsquery() call then it will not find any result, so you can show to user 
void page.



Is a fix being worked on?
If a fix is forthcoming, will it be available in the 8.2 series or only 8.3+?


Possibly, full fix in 8.4. But I will not promise.
8.3 will have protection from queries which doesn't match anything.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev


2007-06-01 23:00:00.001 CEST:% LOG:  GIN incomplete splits=8


Just to be sure: patch fixes *creating* of WAL log, not replaying. So, primary 
db should be patched too.


During weekend I found possible deadlock in locking protocol in GIN between 
concurrent UPDATE and VACUUM queries with the same GIN index involved. Strange, 
but I didn't see it in 8.2 and even now I can't reproduce it. It's easy to 
reproduce оnly on HEAD with recently added ReadBufferWithStrategy() call instead 
of ReadBuffer(). ReadBufferWithStrategy() call was added to implement 
limited-size ring of buffers for VACUUM. Nevertheless, it's a possible 
scenario in 8.2.


Attached patch fixes that deadlock bug too. And, previous version of my patch 
has a mistake which is observable on CREATE INDEX .. USING GIN query.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


patch_wal_gin.v6.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev


1. After a certain point, consecutive GIN index splits cause a problem.
The new RHS block numbers are consecutive from 111780+

That's newly created page. Splitted page might have any number



2. The incomplete splits stay around indefinitely after creation and we
aren't trying to remove the wrong split at any point. We're either never
creating an xlog record, or we are ignoring it in recovery, or we are
somehow making multiple entries then not removing all of them.

Agreed



3. The root seems to move, which isn't what I personally was expecting
to see. It seems root refers to the highest parent involved in the
split.
root in this context means parent of splitted page. Actually, there is a lot of 
B-tree in GIN, see http://www.sigaev.ru/gin/GinStructure.pdf


4. We're writing lots of redo in between failed page splits. So *almost*
everything is working correctly.

5. This starts to happen when we have very large indexes. This may be
coincidental but the first relation file is fairly full (900+ MB).


Yes. It seems to me that conditions of error are very rare and B-tree over 
ItemPointers (second level of GIN) has a big capacity, 1000+ items per page. So, 
splits occur rather rare.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev


Ooops. Patch doesn't apply cleanly. New version.


Attached patch fixes that deadlock bug too. And, previous version of my 
patch has a mistake which is observable on CREATE INDEX .. USING GIN query.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


patch_wal_gin.v7.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev


After some observation of massive reindexing of some hundred thousand
data sets it seems to me that the slave doesn't skip checkpoints
anymore. (Apart from those skipped because of the CheckpointTimeout thing)
I'll keep an eye on it and report back any news on the issue.


Nice, committed. Thank for your report and testing.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] warm standby server stops doing checkpoints afterawhile

2007-06-01 Thread Teodor Sigaev


2007-06-01 13:11:29.365 CEST:% DEBUG:  0: Ressource manager (13)
has partial state information

To me, this points clearly to there being an improperly completed action
in resource manager 13. (GIN) In summary, it appears that there may be
an issue with the GIN code for WAL recovery and this is effecting the
Warm Standby.


Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it 
possible reason of bug?


Attached patch fixes it.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
*** ./src/backend/access/gin/ginxlog.c.orig Fri Jun  1 16:47:47 2007
--- ./src/backend/access/gin/ginxlog.c  Fri Jun  1 16:53:47 2007
***
*** 594,599 
--- 594,600 
  
MemoryContextSwitchTo(topCtx);
MemoryContextDelete(opCtx);
+   incomplete_splits = NIL;
  }
  
  bool

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] warm standby server stops doing checkpointsafterawhile

2007-06-01 Thread Teodor Sigaev


2007-06-01 16:28:51.708 CEST:% LOG:  GIN incomplete split root:8
l:45303 r:111740 at redo CA/C8243C28

...

2007-06-01 16:38:23.133 CEST:% LOG:  GIN incomplete split root:8
l:45303 r:111740 at redo CA/C8243C28


Looks like a bug in GIN. I'll play with it. Can you provide more details about 
your test?



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-01 Thread Teodor Sigaev




I'd suggest we throw an error, as shown in the enclosed patch. Frank,
can you give that a whirl to provide Teodor with something more to work
with? Thanks.


I already makes a test suite which reproduce the problem - it leaves incompleted 
splits. But I discover one more problem: deadlock on buffer's lock. So, right 
now I investigate the problem.




Neither GIST nor B-tree seems to throw an error in corresponding
locations also, so the potential for not being able to track this is
high. I'd want to throw errors in those locations also.


Agreed, I'll add more check
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-01 Thread Teodor Sigaev



Found a reason: if parent page is fully backuped after child's split then 
forgetIncompleteSplit() isn't called at all.


Hope, attached patch fix that. Pls, test it.

PS I'm going away for weekend, so I'll not be online until Monday.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


patch_wal_gin.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] TSEARCH2: disable stemming in indexes and triggers

2007-05-31 Thread Teodor Sigaev

I found out that using 'simple' instead of 'default' when using 
to_tsvector() does excactly that, but I don't know how to change my 
triggers and indexes to keep doing the same (using 'simple'). 


Suppose, your database is initialized with C locale. So, just mark 
simple configuration as default:


# update pg_ts_cfg set locale=null where ts_name='default';
# update pg_ts_cfg set locale='C' where ts_name='simple';

If your locale setting is not C then mark needed configuration with your 
 locale.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] opclass for real[]

2007-05-30 Thread Teodor Sigaev

ERROR:  data type real[] has no default operator class for access method 
gist
HINT:  You must specify an operator class for the index or define a 
default operator class for the data type.

There is operator class for GIN for real[].
http://www.postgresql.org/docs/8.2/static/xindex.html#XINDEX-GIN-ARRAY-STRAT-TABLE



Is there a opclass defined in 8.2 or I have to create one. In either 
case can you please give a link for information on opclasses.


Thanks
Abhang


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] crash creating tsearch2 index

2007-05-28 Thread Teodor Sigaev


Could you provide a test suite?

John DeSoi wrote:

Hi,

I'm trying to dump and restore a copy of a database in the same cluster. 
pg_restore would abort when creating a tsearch2 gist index. So I dumped 
to text removed the CREATE INDEX commands and tried to do that at the 
end after the rest of the database was loaded. I still have the same 
problem:


CREATE INDEX song_tsx_title_idx ON song USING gist (tsx_title 
public.gist_tsvector_ops);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

This is pg 8.0.8 in a shared hosting environment, so I don't have a lot 
of options for tweaking. Is there a known work-around for this?


Thanks,




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Postgresql 8.2.4 crash with tsearch2

2007-05-24 Thread Teodor Sigaev


Pls, check your steps or say me where I'm wrong :)
If you still have a problems, I can solve it if I'll have access to your 
developer server...


% cd PGSQL_SRC
% zcat ~/tmp/tsearch_snowball_82-20070504.gz| patch -p0
% cd contrib/tsearch2
% gmake  su -c 'gmake install'  gmake installcheck
% cd gendict
% cp ~/tmp/libstemmer_c/src_c/stem_UTF_8_french.c stem.c
% cp ~/tmp/libstemmer_c/src_c/stem_UTF_8_french.h stem.h
% ./config.sh -n fr -s -p french_UTF_8 -v -C'Snowball stemmer for 
French - UTF8'

% cd ../../dict_fr
% gmake  su -c 'gmake install'
% psql contrib_regression  dict_fr.sql

contrib_regression=# select lexize('fr', 'sortir'), lexize('fr', 
'service'), lexize('fr', 'chose');

 lexize |  lexize  | lexize
+--+
 {sort} | {servic} | {chos}
(1 row)

contrib_regression=# select lexize('fr', 
'as');


 lexize
 

{}



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] tsearch2 dictionary that indexes substrings?

2007-04-24 Thread Teodor Sigaev


My colleague who speaks more C than me came up with the code below
which works fine for us. Will the memory allocated for lexeme be freed

Nice, except self-defined utf8 properties. I think it will be much better to use
pg_mblen(char*). In this case your dictionary will work with any supported by 
pgsql encodings.



by the caller?

Yes, of course.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] indexing array columns

2007-04-16 Thread Teodor Sigaev


you should be able to index the way you want. In contrib there a module
cube which does similar to what you want to 3D, extending it to 12D
shouldn't be too hard...


contrib/cube module implements N dimensional cube representation

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Tsearch2 crashes my backend, ouch !

2007-04-02 Thread Teodor Sigaev


Fixed. Thanks for the report.

Anyway, just to signal that tsearch2 crashes if SELECT is not 
granted to pg_ts_dict (other tables give a proper error message when 
not GRANTed).On

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] to_tsvector in 8.2.3

2007-04-02 Thread Teodor Sigaev


Sorry, no - I tested on CVS HEAD, so dll isn't compatible :(
Wait a bit for 8.2.4

richardcraig wrote:

Teodor

As a non-C windows user (yes - throw stones at me :) ) Do you have a fixed
dll for this patch that I can try?

Thanks

Richard


Teodor Sigaev-2 wrote:

Solved, see attached patch. I had found old Celeron-300 box and install
Windows 
on it, and it was very slow :)




Nope, same result with this patch.

Thank you.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
WWW:
http://www.sigaev.ru/

*** ./contrib/tsearch2.orig/./wordparser/parser.c   Thu Mar 22 18:39:23 2007
--- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007
***
*** 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(unsigned int*)(prs-wstr + 
prs-state-poschar);
  
  			/*

 * any non-ascii symbol with multibyte encoding
--- 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(prs-wstr + prs-state-poschar);
  
  			/*

 * any non-ascii symbol with multibyte encoding


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq






--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Tsearch2 crashes my backend, ouch !

2007-03-30 Thread Teodor Sigaev


which version of pgsql exactly?

Listmail wrote:


Hello,

I have just ditched Gentoo and installed a brand new kubuntu system 
(was tired of the endless compiles).
I have a problem with crashing tsearch2. This appeared both on 
Gentoo and the brand new kubuntu.


I will describe all my install procedure, maybe I'm doing something 
wrong.


Cluster is newly created and empty.

initdb was done with UNICODE encoding  locales.

# from postgresql.conf

# These settings are initialized by initdb -- they might be changed
lc_messages = 'fr_FR.UTF-8' # locale for system 
error message strings
lc_monetary = 'fr_FR.UTF-8' # locale for monetary 
formatting
lc_numeric = 'fr_FR.UTF-8'  # locale for number 
formatting
lc_time = 'fr_FR.UTF-8' # locale for time 
formatting


[EMAIL PROTECTED]:~$ locale
LANG=fr_FR.UTF-8
LC_CTYPE=fr_FR.UTF-8
LC_NUMERIC=fr_FR.UTF-8
etc...

First import needed .sql files from contrib and check that the 
default tsearch2 config works for English


$ createdb -U postgres test
$ psql -U postgres test tsearch2.sql and other contribs I use
$ psql -U postgres test

test=# select lexize( 'en_stem', 'flying' );
 lexize

 {fli}

test=# select to_tsvector('default', 'flying ducks');
   to_tsvector
--
 'fli':1 'duck':2

OK, seems to work very nicely, now install French.
Since this is Kubuntu there is no source, so download source, then :

- apply patch_tsearch_snowball_82 from tsearch2 website

./configure --prefix=/usr/lib/postgresql/8.2/ 
--datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python

cd contrib/tsearch2
make
cd gendict
(copy french stem.c and stem.h from the snowball website)
./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h 
-C'Snowball stemmer for French'

cd ../../dict_fr
make clean  make
sudo make install

Now we have :

/bin/sh ../../config/install-sh -c -m 644 dict_fr.sql 
'/usr/share/postgresql/8.2/contrib'
/bin/sh ../../config/install-sh -c -m 755  libdict_fr.so.0.0 
'/usr/lib/postgresql/8.2/lib/dict_fr.so'


Okay...

- download and install UTF8 french dictionaries from 
http://www.davidgis.fr/download/tsearch2_french_files.zip and put them 
in contrib directory

(the files delivered by debian package ifrench are ISO8859, bleh)

- import french shared libs
psql -U postgres test  /usr/share/postgresql/8.2/contrib/dict_fr.sql

Then :

test=# select lexize( 'en_stem', 'flying' );
 lexize

 {fli}

And :

test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)';
 dict_name |   dict_init   |   dict_initoption|  
dict_lexize  |dict_comment
---+---+--+---+- 

 en_stem   | snb_en_init(internal) | contrib/english.stop | 
snb_lexize(internal,internal,integer) | English Stemmer. Snowball.
 fr| dinit_fr(internal)|  | 
snb_lexize(internal,internal,integer) | Snowball stemmer for French


test=# select lexize( 'fr', 'voyageur' );
server closed the connection unexpectedly

BLAM ! Try something else :

test=# UPDATE pg_ts_dict SET 
dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE 
dict_name = 'fr';

UPDATE 1
test=# select lexize( 'fr', 'voyageur' );
server closed the connection unexpectedly

Try other options :

dict_name   | fr_ispell
dict_init   | spell_init(internal)
dict_initoption | 
DictFile=/usr/share/postgresql/8.2/contrib/french.dict,AffFile=/usr/share/postgresql/8.2/contrib/french.aff,StopFile=/usr/share/postgresql/8.2/contrib/french.stop 


dict_lexize | spell_lexize(internal,internal,integer)
dict_comment|

test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', 
'voyageur' );

-[ RECORD 1 ]---
lexize | {travel}
lexize | {voyageuse}

Now it works (kinda) but stemming doesn't stem for French (since 
snowball is out). It should return 'voyage' (=travel) instead of 
'voyageuse' (=female traveler)

That's now what I want ; i want to use snowball to stem French words.

I'm going to make a debug build and try to debug it, but if anyone 
can help, you're really, really welcome.






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Tsearch2 crashes my backend, ouch !

2007-03-30 Thread Teodor Sigaev


(copy french stem.c and stem.h from the snowball website)
Take french stemmer from 
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/stemmer/stemmer_utf8_french.tar.gz

At least, it works for me.

Sorry, but Snowball's interfaces are changed very quickly and unpredictable and 
Snowball doesn't use version mark or something similar. So, downloaded Snowball 
core and stemmers in different time may be incompatible :(.


Our tsearch_core patch (moving tsearch into core of pgsql) solves that problem - 
it contains all possible snowball stemmers.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] to_tsvector in 8.2.3

2007-03-22 Thread Teodor Sigaev

Solved, see attached patch. I had found old Celeron-300 box and install Windows 
on it, and it was very slow :)




Nope, same result with this patch.

Thank you.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2.orig/./wordparser/parser.c   Thu Mar 22 18:39:23 2007
--- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007
***
*** 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(unsigned int*)(prs-wstr + 
prs-state-poschar);
  
/*
 * any non-ascii symbol with multibyte encoding
--- 117,123 
{
if (lc_ctype_is_c())
{
!   unsigned int c = *(prs-wstr + prs-state-poschar);
  
/*
 * any non-ascii symbol with multibyte encoding

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Teodor Sigaev

I can't reproduce your problem, but I have not Windows box, can anybody 
reproduce that?



contrib_regression=# select version();
 version 

 PostgreSQL 8.2.3 on i386-unknown-freebsd6.2, compiled by GCC gcc (GCC) 3.4.6 
[FreeBSD] 20060305

(1 row)
contrib_regression=#  show server_encoding ;
 server_encoding
-
 UTF8
(1 row)

contrib_regression=# show lc_collate;
 lc_collate

 C
(1 row)

contrib_regression=# show lc_ctype;
 lc_ctype
--
 C
(1 row)

contrib_regression=# select to_tsvector('test text');
to_tsvector
---
 'test':1 'text':2
(1 row)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Teodor Sigaev


8.2 has fully rewritten text parser based on POSIX is* functions.

Thomas Pundt wrote:

On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote:
| I can't reproduce your problem, but I have not Windows box, can anybody
| reproduce that?

just a guess in the wild; I once had a similar phenomen and tracked it down
to a non breaking space character (0xA0). Since then I'm patching the
tsearch2 lexer:

--- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l
+++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l
@@ -78,8 +78,8 @@
 /* cyrillic koi8 char */
 CYRALNUM   [0-9\200-\377]
 CYRALPHA   [\200-\377]
-ALPHA  [a-zA-Z\200-\377]
-ALNUM  [0-9a-zA-Z\200-\377]
+ALPHA  [a-zA-Z\200-\237\241-\377]
+ALNUM  [0-9a-zA-Z\200-\237\241-\377]
 
 
 HOSTNAME   ([-_[:alnum:]]+\.)+[[:alpha:]]+

@@ -307,7 +307,7 @@
return UWORD; 
 }
 
-[ \r\n\t]+ {

+[ \240\r\n\t]+ {
token = tsearch2_yytext;
tokenlen = tsearch2_yyleng;
return SPACE;


Ciao,
Thomas



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] multi terabyte fulltext searching

2007-03-21 Thread Teodor Sigaev

I'm afraid that fulltext search on multiterabytes set of documents can not be 
implemented on any RDBMS, at least on single box. Specialized fulltext search 
engines (with exact matching and time to search about one second) has practical 
limit near 20 millions of docs, cluster - near 100 millions.  Bigger collections 
require engines like a google.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] multi terabyte fulltext searching

2007-03-21 Thread Teodor Sigaev

I am currently using GIST indexes because I receive about 10GB of new 
data a week (then again I am not deleting any information).  The do not 
expect to be able to stop receiving text for about 5 years, so the data 
is not going to become static any time soon.  The reason I am concerned 
with performance is that I am providing a search system for several 
newspapers since essentially the beginning of time.  Many bibliographer 
etc would like to use this utility but if each search takes too long I 
am not going to be able to support many concurrent users.


Use GiST and GIN indexes together: any data older than one month (which doesn't 
change) with GIN index and new data with GiST. And one time per month moves data 
from GiST to GIN.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] to_tsvector in 8.2.3

2007-03-21 Thread Teodor Sigaev


postgres=# select to_tsvector('test text');
  to_tsvector
---
 'test text':1
(1 row)
Ok. that's related to 
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h
commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit 
assumes any character with C locale and multibyte encoding and  0x7f is alpha.

To check theory, pls, apply attached patch.

If so, I'm confused, we can not assume that 0xa0 is a space symbol in any 
multibyte encoding, even in Windows.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
*** ./contrib/tsearch2/wordparser/parser.c.orig Wed Mar 21 20:41:23 2007
--- ./contrib/tsearch2/wordparser/parser.c  Wed Mar 21 21:10:39 2007
***
*** 124,130 
--- 124,134 
 * with C-locale is an alpha character
 */
if ( c  0x7f )
+   {
+   if ( c == 0xa0 )
+   return 0;
return 1;
+   }
  
return isalnum(0xff  c);
}
***
*** 157,163 
--- 161,171 
 * with C-locale is an alpha character
 */
if ( c  0x7f )
+   {
+   if ( c == 0xa0 )
+   return 0;
return 1;
+   }
  
return isalpha(0xff  c);
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] tsearch2: word position

2007-02-22 Thread Teodor Sigaev

to_tsvector() could as well return the character number or a byte 
pointer, I could see advantages for both. But the word number makes 
little sense to me.


Word number is used only in ranking functions. If you don't need a ranking than 
you could safely strip positional information.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] tsearch2: word position

2007-02-22 Thread Teodor Sigaev

Huh? I explicitly *want* positional information. But I find the word 
number to be less useful than a character number or a simple (byte) 
pointer to the position of the word in the string.


Given only the word number, I have to go and parse the string again.


byte offset of word is useless for ranking purpose
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] tsearch2: word position

2007-02-22 Thread Teodor Sigaev




No, the first X aren't more important, but being able to determine
word proximity is very important for partial phrase matching and
ranking.  The closer the words, the better the match, all else being
equal.

exactly

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] tsearch2: word position

2007-02-21 Thread Teodor Sigaev

I'm fiddling with to_tsvector() and parse() from tsearch2, trying to get 
the word position from those functions. I'd like to use the tsearch2 
parser and stemmer, but I need to know the exact position of the word as 
well as the original, unstemmed word.


It's not supposed usage... Why do you need that?

And this only tells me a word position, not a character or byte position 
within the string. Is there a way to get this information from tsearch2?


Have a look to headline framework as an example or staring point. hlparsetext() 
returns  parsed text with matched lexemes in tsquery. Small description of 
hlparsetext is placed at 
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html

near the end. Description of HLWORD struct is some out of day, sorry.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Having performance problems with TSearch2

2007-02-20 Thread Teodor Sigaev


Use GIN index instead of GiST

I have a table of books, with 120 registers. I have created an GIST 
index over the title and subtitle,

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] intarray index vs gin index

2007-02-09 Thread Teodor Sigaev

intarray. My question is whether I still should use intarray for 
indexing (if yes then either I should use GIST or GIN) or maybe GIN 
index is faster than GIST+intarray / GIN+intarray.

Yes, with intarray you can use GiST/GIN indexes which you wish

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Stats collector frozen?

2007-01-26 Thread Teodor Sigaev


Apparantly there is a bug lurking somewhere in pgwin32_select(). Because
if I put a #undef select right before the select in pgstat.c, the
regression tests pass. 


May be, problem is related to fixed bug in pgwin32_waitforsinglesocket() ?
WaitForMultipleObjectsEx might sleep indefinitely while waiting socket to write, 
so, may be there is symmetrical problem with read? Or pgwin32_select() is used 
for waiting write too?




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Avoiding empty queries in tsearch

2007-01-15 Thread Teodor Sigaev


contrib_regression=# select numnode( plainto_tsquery('the any') );
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s), ignored
 numnode
-
   0
(1 row)

contrib_regression=# select numnode( plainto_tsquery('the table') );
 numnode
-
   1
(1 row)

contrib_regression=# select numnode( plainto_tsquery('long table') );
 numnode
-
   3
(1 row)


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Avoiding empty queries in tsearch

2007-01-15 Thread Teodor Sigaev




Doug Cole wrote:
That sounds perfect, but it doesn't seem to exist on either of the 
postgresql installations I have access to (8.1 on ubuntu and fedora 
core).  Is it new to 8.2?  Is there a similar function under 8.1, or at 

Yes, it's new in 8.2


least a decent work-around?  Thanks for the help,
Doug


Not nice workaround but it works:

# create or replace function isvoid(tsquery)
returns bool as $$
select case when $1 is NULL then 't'::bool when length(textin(tsquery_out( $1 
))) = 0 then 't'::bool else 'f'::bool end;

$$ language SQL called on null input;


# select isvoid( plainto_tsquery('the  any') );
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s), ignored
 isvoid

 t
(1 row)



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Tsearch2 default locale on postgres 8.2

2006-12-22 Thread Teodor Sigaev


set_curcfg() is working only for current session

Tarabas (Manuel Rorarius) wrote:

Hi!

   I am having a tsearch2 problem on postgres 8.2 again ...
   when I try to set the default config for tsearch2 with

   select set_curcfg('default'); it works fine in the same pgadmin
   session when i use select show_curcfg(); afterwards. The correct
   OID is shown.

   If i then close the query window and open a new one and then try
   the select show_curcfg(); again, it states
   ERROR:  could not find tsearch config by locale

   Any idea why the configuration is not saved correctly?

Best regards
Manuel ...


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] PG 8.2.0 - TSearch2 Wrong affix file format

2006-12-21 Thread Teodor Sigaev


Send to me dict and aff file, pls

By the way, tsearch2 is changed significantly, so, the better way to update is a 
restoring only your dataschema. Tsearch2 should be installed from contrib.


Tarabas (Manuel Rorarius) wrote:

Hi!

  I have a problem migrating my Database using TSearch2 with the UTF-8
  Backport from 8.1.3 to a new database with 8.2.0.
  
  I successfully installed postgres and the TSearch2 distributed with

  it and copied the german.aff/german.med/german.stop and
  german.stop.ispell from my old postgres 8.1.3 installation to the
  same location in the 8.2.0 install.

  Then I dumped the old database with
  
  ./pg_dump database -f backup-file and restored it on the

  8.2.0 database successfully without errors.

  I am using UTF-8 database and files for .aff/.med/.stop/.stop.ispell

  When I now try a TSearch2 Command like

  SELECT set_curdict('de_ispell');

  I get the error
  
  ERROR:  Wrong affix file format although the file was not changed

  and worked fine on the 8.1.3 Databse with the UTF-8 backport Patch
  from 8.2.0.

  Anyone have any idea how to fix the files so they will work with
  8.2.0 also? The files seem to be ok and are UTF-8 encoded.

Best regards
Manuel


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] PG 8.2.0 - TSearch2 Wrong affix file format

2006-12-21 Thread Teodor Sigaev


Affix file has artifact:
PFX G N 1
   . GE

which is strange mix of openoffice format and ispell format. Just remove they.

8.2 Ispell code checks format strongly that in previous versions, even 
backported :)

Tarabas (Manuel Rorarius) wrote:

Hi,

TS Send to me dict and aff file, pls

see attached .aff file, I have not created the file from a dict
myself but taken the .aff/.med/.stop/.stop.ispell from this blog:

http://www.tauceti.net/roller/cetixx/category/Tipps ...

TS By the way, tsearch2 is changed significantly, so, the better way to update 
is a
TS restoring only your dataschema. Tsearch2 should be installed from contrib.

That's what I did ... I used the tsearch delivered with 8.2.0 contrib
for the install. Only the Schema and Data for my database was imported with the
restore from the old system, so that should all be setup correctly :-)

I also tested the correct tsearch2 install by removing all lines from the .aff
file and the error then vanishes and the search works, but without the
.aff I guess a key feature is missing :-)

Best regards
Manuel





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] TSearch2 Changeset 25387

2006-12-21 Thread Teodor Sigaev

Are you trying to convert openoffice (myspell) format to ispell with help of 
my2ispell?


It seems to me, I see the problem. m2ispell doesn't convert prefixes which can 
not be combined with every word ('N' in myspell). So, ispell file will contain 
wrong line begining with PFX...


I'll fixed that.

Hannes Dorbath wrote:

http://projects.commandprompt.com/public/pgsql/changeset/25387

Though I'm probably start going on Oleg's nerves.. :/

I'm still trying to get compound word support for my dictionaries back, 
while migrating from 8.1.5-gin-utf8 to 8.2.


Can someone give me additional information on that change? My affix file 
trigger that oldFormat condition on line 472. Where is the change in 
affix file format documented? What has changed? Any way to convert them?


I found some OpenOffice pages about it, but I failed to find what I'm 
looking for.


IIRC I had TSearch2 with my `oldFormat' files working on an older 
8.2-dev-snapshot.



Thanks for any hint.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] TSearch2 Changeset 25387

2006-12-21 Thread Teodor Sigaev


Hmm, 2.0.1. But what's the difference? I don't watch changes in OpenOffice 
hardly.



Hannes Dorbath wrote:
What version of OpenOffice MySpell dictionaries is supposed to work with 
TSearch in 8.2?


The format used till OpenOffice 2.0.1 or the format starting from 2.0.2?


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] TSearch2 Changeset 25387

2006-12-21 Thread Teodor Sigaev


Oh, I see. So, only 2.0.1 and I can't change that for 8.2 branch. :(

Hannes Dorbath wrote:

On 21.12.2006 18:32, Teodor Sigaev wrote:
Are you trying to convert openoffice (myspell) format to ispell with 
help of my2ispell?


Yes:

http://groups.google.com/group/pgsql.general/browse_thread/thread/c21872aca3754a06/3a909c0e1f05a5af 



I'm really unsure what someone is supposed to do, to get compound word 
support in 8.2 working.


http://projects.commandprompt.com/public/pgsql/changeset/25387

In the comment it is stated that for German one should still use 
my2ispell. I had no luck with that.


One the other hand:

http://wiki.services.openoffice.org/wiki/Dictionaries#German_.28Germany.2C_29 



tells that the new MySpell dicts, starting from OO 2.0.2, should be fine 
for compound word support.


Thanks for your time.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Speed of postgres compared to ms sql, is this

2006-12-05 Thread Teodor Sigaev


These sorts of reports would be far more helpful if they contained some
specifics.  What queries does MSSQL do better than Postgres, exactly?


Our OR-patch was inspired by our customer migrating from MS SQL to postgres. 
Next, index support of IS NULL. And, there is a huge difference in performance 
for queries like

select * from a,b where a.f = b.f or ( a.f is null and b.f is null)

NULL support is fast in MS SQL because MS SQL doesn't follow SQL standard: index 
in MS SQL believes that (NULL = NULL) is true.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

1 2 >

1 - 100 of 173 matches

Mail list logo