Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?
CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops); Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=1 width=1261) (actual time=2157.118..1259550.327 rows=909091 loops=1) Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) Rows Removed by Index Recheck: 4360296 Heap Blocks: exact=37031 lossy=872059 Hmm, looks like too small work_mem because lossy heap block count is too big. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7 ( ./configure && gmake all; ) 168,99s user 15,46s system 97% cpu 3:09,61 total ( cmake . && gmake all; ) 75,11s user 11,34s system 100% cpu 1:26,30 total Cmake 2 times faster, that is good, but I don't understand why. Which optimization level does cmake buld use by default? Which compiler does it take? It's not obvious, because cmake build hides actual compiler command line. Yury, pls, return back check target... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
Teodor Sigaev wrote: I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7 ( ./configure && gmake all; ) 168,99s user 15,46s system 97% cpu 3:09,61 total ( cmake . && gmake all; ) 75,11s user 11,34s system 100% cpu 1:26,30 total ( CFLAGS='-O2' cmake . && gmake all; ) 141,87s user 12,18s system 97% cpu 2:37,40 total Oops, cmake default target is compiled with -O0. With -O2 cmake is still faster but not so much. Cmake 2 times faster, that is good, but I don't understand why. Which optimization level does cmake buld use by default? Which compiler does it take? It's not obvious, because cmake build hides actual compiler command line. Yury, pls, return back check target... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
Hm, I don't think having the compile/link lines be hidden up is acceptable. Many times we need to debug some compile problem, and the output is mandatory. +1 Although it could be fixed by VERBOSE=1 make -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb value retrieval performance
does it read in the whole jsonb tree structure in memory and get to v1 or it has some optimization so only get v1 instead of reading in the whole structure. it reads, untoasts and uncompresses whole value and then executes search. An idea to fix that is a reading jsonb value by only needed chunks. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb value retrieval performance
and I am trying to get value via jsonb->parentKey->childKey it seems it is very slow. Would it be actually faster to use top level key only and parse it at client side? Suppose, most time is spent for decompressing huge value, not for actual search inside jsonb. If so, we need to implement some search method which decompress some chunks of jsonb. Could you send to me an example of that jsonb? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb value retrieval performance
Suppose, most time is spent for decompressing huge value, not for actual search inside jsonb. If so, we need to implement some search method which decompress some chunks of jsonb. On artificial example: %SAMP IMAGE FUNCTION CALLERS 92.9 postgres pglz_decompress toast_decompress_datum -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Prefix search on all hstore values
Hi! Full-text search has this feature. # select to_tsvector('en_name=yes, fr_name=oui'::hstore::text) @@ 'en:*'; ?column? -- t or (index only keys) select to_tsvector(akeys('en_name=yes, fr_name=oui'::hstore)::text) @@ 'en:*'; ?column? -- t To speed up this queries you use functional indexes. Albert Chern wrote: Hi, I have an hstore column that stores a string in several arbitrary languages, so something like this: en = string in english, zh = string in chinese, fr = string in french Is it possible to construct an index that can be used to determine if a query string is a prefix of ANY of the values in the hstore? From reading the documentation the closest I've gotten is a gin index after converting the values to an array, but that doesn't seem to work with prefix searching. Any pointers would be much appreciated! Thanks, Albert -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Prefix search on all hstore values
My requirements can be relaxed to full text search, but the problem I had with that approach is I have strings in Chinese, and postgres doesn't seem to support it. Calling to_tsvector() on Chinese characters always returns an empty vector. Hm, check your locale settings. AFAIK, somebody uses FTS with Chinese language. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS query results with GIN index
Basically, I started testing prefix matching in FTS and got into troubles. Self-contained example follows: Thank you, fixed. The reason was in incorrect optimization of GIN scan: GIN reuses scan result for equals key, but comparison of key didn't take into account a difference of scan's strategy. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS query results with GIN index
Great, thank you! I assume this one goes into 8.4.3, right? Yeah, or apply patch http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/gin/ginscan.c?r1=1.25r2=1.26 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS query results with GIN index
Thank you for the report, will see on this weekend Vyacheslav Kalinin wrote: Hello, Basically, I started testing prefix matching in FTS and got into troubles. Self-contained example follows: -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Vacuumdb Fails: Huge Tuple
APseudoUtopia apseudouto...@gmail.com writes: Here's what happened: $ vacuumdb --all --full --analyze --no-password vacuumdb: vacuuming database postgres vacuumdb: vacuuming database web_main vacuumdb: vacuuming of database web_main failed: ERROR: б═huge tuple PostgreSQL 8.4.0 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 32-bit Pls, apply attached patch. Patch increases max size from approximately 500 bytes up to 2700 bytes, so vacuum will be able to finish. This is evidently coming out of ginHeapTupleFastCollect because it's formed a GIN tuple that is too large (either too long a word, or too many postings, or both). I'd say that this represents a serious degradation in usability from pre-8.4 releases: before, you would have gotten the error upon attempting to insert the table row that triggers the problem. Now, with the fast insert stuff, you don't find out until VACUUM fails, and you have no idea where the bad data is. Not cool. Oleg, Teodor, what can we do about this? Can we split an oversize tuple into multiple entries? Can we apply suitable size checks before instead of after the fast-insert queue? ginHeapTupleFastCollect and ginEntryInsert checked tuple's size for TOAST_INDEX_TARGET, but ginHeapTupleFastCollect checks without one ItemPointer, as ginEntryInsert does it. So ginHeapTupleFastCollect could produce a tuple which 6-bytes larger than allowed by ginEntryInsert. ginEntryInsert is called during pending list cleanup. Patch removes checking of TOAST_INDEX_TARGET and use checking only by GinMaxItemSize which is greater than TOAST_INDEX_TARGET. All size's check is now in GinFormTuple. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ patch.gz Description: Unix tar archive -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Vacuumdb Fails: Huge Tuple
Looks reasonable, although since the error is potentially user-facing I think we should put a bit more effort into the error message (use ereport and make it mention the index name, at least --- is there any other useful information we could give?) Only sizes as it's done in BTree, I suppose. Will you apply this, or do you want me to? I'm not able to provide a good error message in good English :( -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FILLFACTOR for GIN indexes in 8.3.7
it seems that I should reduce the Fill Factor of some FTS indexes, but what is the default ? The other index methods use fillfactor in different but roughly analogous ways; the default fillfactor varies between methods Actually, GIN doesn't use it. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text index not being used
I tried to create an index including all of the fields I query on to see if that would work, but I get an error the the index row is too large: = create index master_index on source_listings(geo_lat, geo_lon, price, bedrooms, region, city, listing_type, to_tsvector('english', full_listing), post_time); It's not a fulltext index - btree doesn't support @@ operation. Read carefully: http://www.postgresql.org/docs/8.3/static/textsearch.html , and about full text indexes: http://www.postgresql.org/docs/8.3/static/textsearch-tables.html , http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
Could you provide a backtrace? Do you use unchanged norwegian.stop file? I'm not able to reproduce the bug - postgres just works. Tommy Gildseth wrote: While trying to create a new dictionary for use with PostgreSQL text search, I get a segfault. My Postgres version is 8.3.5 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
How do I make a backtrace? - if you have coredump, just execute gdb /PATH1/postgres gdb /PATH2/core and type bt. Linux doesn't make core by default, so you allow to do it by ulimit -c unlimited for postgresql user - connect to db, and attach gdb to backend process: gdb /PATH1/postgres BACKEND_PID and type run in gdb, next, execute CREATE DICTIONARY and type bt in gdb Teodor Sigaev wrote: Could you provide a backtrace? Do you use unchanged norwegian.stop file? I'm not able to reproduce the bug - postgres just works. Tommy Gildseth wrote: While trying to create a new dictionary for use with PostgreSQL text search, I get a segfault. My Postgres version is 8.3.5 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
I reproduced the bug with a help of Grzegorz's point for 64-bit box. So, patch is attached and I'm going to commit it -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ *** src/backend/tsearch/spell.c.orig2009-01-29 18:18:03.0 +0300 --- src/backend/tsearch/spell.c 2009-01-29 18:20:09.0 +0300 *** *** 521,527 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(multibyte flag character is not allowed))); ! Conf-flagval[(unsigned int) *s] = (unsigned char) val; Conf-usecompound = true; } --- 521,527 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(multibyte flag character is not allowed))); ! Conf-flagval[*(unsigned char*) s] = (unsigned char) val; Conf-usecompound = true; } *** *** 654,660 ptr = repl + (ptr - prepl) + 1; while (*ptr) { ! aflg |= Conf-flagval[(unsigned int) *ptr]; ptr++; } } --- 654,660 ptr = repl + (ptr - prepl) + 1; while (*ptr) { ! aflg |= Conf-flagval[*(unsigned char*) ptr]; ptr++; } } *** *** 735,741 if (*s pg_mblen(s) == 1) { ! Conf-flagval[(unsigned int) *s] = FF_COMPOUNDFLAG; Conf-usecompound = true; } oldformat = true; --- 735,741 if (*s pg_mblen(s) == 1) { ! Conf-flagval[*(unsigned char*) s] = FF_COMPOUNDFLAG; Conf-usecompound = true; } oldformat = true; *** *** 791,797 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(multibyte flag character is not allowed))); ! flag = (unsigned char) *s; goto nextline; } if (STRNCMP(recoded, COMPOUNDFLAG) == 0 || STRNCMP(recoded, COMPOUNDMIN) == 0 || --- 791,797 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(multibyte flag character is not allowed))); ! flag = *(unsigned char*) s; goto nextline; } if (STRNCMP(recoded, COMPOUNDFLAG) == 0 || STRNCMP(recoded, COMPOUNDMIN) == 0 || *** *** 851,857 while (str *str) { ! flag |= Conf-flagval[(unsigned int) *str]; str++; } --- 851,857 while (str *str) { ! flag |= Conf-flagval[*(unsigned char*) str]; str++; } -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
Than I have quite few notes about that function: - affix is not checked on entry, and should be unsigned, Could be Assert( affix=0 affix Conf-nAffixData ) - for sake of safety uint32_t should be used instead of unsigned int, in the cast see patch - there should be some safety limit for lenght of str, It's a C-string -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
Tom Lane wrote: Teodor Sigaev teo...@sigaev.ru writes: I reproduced the bug with a help of Grzegorz's point for 64-bit box. Hmm, seems it's not so much a 64 bit error as a signed vs unsigned char issue? Yes, but I don't understand why it worked in 32-bit box. Does this affect the old contrib/tsearch2 code? Will check. Please try to make the commits in the next eight hours, as we have release wraps scheduled for tonight. Minor versions or beta of 8.4? if last, I'd like to commit btree_gin and fast_update_gin. For both patches all pointed issues was resolved and Jeff, seems, haven't objections. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
To be honest, looking through that file, I am quite worried about few points. I don't know too much about insights of ispell, but I see few suspicious things in mkSPNode too. I generally don't want to get involve in reviewing code for stuff I don't know, But if Teodor (and Oleg) don't mind, I can raise my points, and see if anything useful comes out of it. If you see bug/mistake/suspicious point, please, don't be quiet Also, about that patch - it doesn't seem to apply cleanly to 8.4, perhaps that file has changed too much (I based my 'review' above on 8.4's code) will tweak -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
char issue? Does this affect the old contrib/tsearch2 code? Checked - No, that was improvement for 8.3 :). -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] very long update gin index troubles back?
No matter if I drop the trigger that update agg content and the fact that I'm just updating d, postgresql will update the index? Yes, due to MVCC. Update of row could produce new version (tuple) and new version should be index as old one. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] very long update gin index troubles back?
Ответишь ему что-нибудь? Он так мутно излагает, что я ни хрена не понял. Ivan Sergio Borgonovo wrote: I've a table that contain a tsvector that is indexed (gin) and triggers to update the tsvector that should then update the index. This gin index has always been problematic. Recreation and updates were very slow. Now I had to update 1M rows of that table but for columns that doesn't involve the tsvector I dropped the trigger to update the tsvector so that when rows get updated the trigger won't be called so things should be faster... but still it is taking forever. begin; set constraints all deferred; select * from FT1IDX_trigger_drop(); update catalog_items set APrice=p.PrezzoA, BPrice=p.PrezzoB from import.catalog_prices p where catalog_items.ItemID=p.id; select * from FT1IDX_trigger_create(); commit; function are used since I've 2 triggers actually that I drop and create. Is there anything wrong in the above to make this update so slow on a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] very long update gin index troubles back?
GIN index is slow for update by its construction. When you update the rows with or without columns indexed by GIN, postgres (in most cases) will insert new records, so index insertion will occur. So, for large updates it's much cheaper to drop and create index. That was a one of reasons to develop fast_insert_gin patch which now in review process. Ivan Sergio Borgonovo wrote: I've a table that contain a tsvector that is indexed (gin) and triggers to update the tsvector that should then update the index. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [TextSearch] syntax error while parsing affix file
iconv -f windows-1251 -t utf-8 bulgarian.dic bulgarian_utf8.dict iconv -f windows-1251 -t utf-8 bulgarian.aff bulgarian_utf8.affix The locale of the database is fr_FR, and its encoding is UTF8. I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones are not acceptable for french locale :( -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [TextSearch] syntax error while parsing affix file
I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell dictionary (the OpenOffice one) for Textsearch features. flag *A: . А (this is line 24) . АТА . И . ИТЕ OpenOffice or ISpell? Pls, provide: - link to download of dictionary - Locale and encoding setting of your db -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to reduce impact of a query.
The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non standard parts of my postgresql.conf are as follows: max_connections=100 shared_buffers=128MB work_mem=4MB maintenance_work_mem=256MB max_fsm_pages=204800 max_fsm_relations=1500 Any tips appreciated. Pls, show 1) effective_cache_size 2) The query 3) Output of EXPLAIN ANALYZE of query -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
Yeah, I'm not convinced either. Still, Teodor's theory should be easily testable: set synchronize_seqscans to FALSE and see if the problem goes away. Test suit to reproduce the problem: DROP TABLE IF EXISTS foo; DROP TABLE IF EXISTS footmp; CREATE OR REPLACE FUNCTION gen_array() RETURNS _int4 AS $$ SELECT ARRAY( SELECT (random()*1000)::int FROM generate_series(1,10+(random()*90)::int) ) $$ LANGUAGE SQL VOLATILE; SELECT gen_array() AS v INTO foo FROM generate_series(1,10); VACUUM ANALYZE foo; CREATE INDEX fooidx ON foo USING gin (v); DROP INDEX fooidx; SELECT * INTO footmp FROM foo LIMIT 9; CREATE INDEX fooidx ON foo USING gin (v); DROP INDEX fooidx; On my notebook with HEAD and default postgresql.conf it produce (show only interesting part): postgres=# CREATE INDEX fooidx ON foo USING gin (v); Time: 14961,409 ms postgres=# SELECT * INTO footmp FROM foo LIMIT 9; postgres=# CREATE INDEX fooidx ON foo USING gin (v); LOG: checkpoints are occurring too frequently (12 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. LOG: checkpoints are occurring too frequently (8 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. LOG: checkpoints are occurring too frequently (7 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. LOG: checkpoints are occurring too frequently (10 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. LOG: checkpoints are occurring too frequently (8 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. CREATE INDEX Time: 56286,507 ms So, time for creation is 4-time bigger after select. Without SELECT * INTO footmp FROM foo LIMIT 9;: postgres=# CREATE INDEX fooidx ON foo USING gin (v); CREATE INDEX Time: 13894,050 ms postgres=# CREATE INDEX fooidx ON foo USING gin (v); LOG: checkpoints are occurring too frequently (14 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. CREATE INDEX Time: 15087,348 ms Near to the same time. With synchronize_seqscans = off and SELECT: postgres=# CREATE INDEX fooidx ON foo USING gin (v); CREATE INDEX Time: 14452,024 ms postgres=# SELECT * INTO footmp FROM foo LIMIT 9; postgres=# CREATE INDEX fooidx ON foo USING gin (v); LOG: checkpoints are occurring too frequently (16 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. CREATE INDEX Time: 14557,750 ms Again, near to the same time. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
We could extend IndexBuildHeapScan's API to support that, but I'm not quite convinced that this is the issue. That extension might be useful for bitmap index too to simplify index creation process. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
changing it; I've applied a patch for that. I'm still not quite convinced that Ivan isn't seeing some other issue though. Thank you In the meantime, I noticed something odd while experimenting with your test case: when running with default maintenance_work_mem = 16MB, there is a slowdown of 3x or 4x for the un-ordered case, just as you say. But at maintenance_work_mem = 200MB I see very little difference. This doesn't make sense to me --- it seems like a larger workspace should result in more difference because of greater chance to dump a lot of tuples into the index at once. Do you know why that's happening? I suppose, if maintenance_work_mem is rather big then all data of index accumulates in memory and so it writes at disk at once. With that test's options size of index is equal to 40Mb. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
GIN's build algorithm could use bulk insert of ItemPointers if and only if they should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in gindatapage.c, lines 407-427) I'm not following. Rightmost page of what --- it can't be the whole index, can it, or the case would hardly ever apply? GIN's index contains btree over keys (entry tree) and for each key it contains list of ItemPointers (posting list) or btree over ItemPointers (posting tree or data tree) depending on its quantity. Bulk insertion process collects into memory keys and sorted arrays of ItemPointers, and then for each keys, it tries to insert every ItemPointer from array into corresponding data tree one by one. But if the smallest ItemPointer in array is greater than the biggest stored one then algorithm will insert the whole array on rightmost page in data tree. So, in that case process can insert about 1000 ItemPointers per one data tree lookup, in opposite case it does 1000 lookups in data tree. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
Any suggestion about how to track down the problem? What you are describing sounds rather like a use-of-uninitialized-memory problem, wherein the behavior depends on what happened to be in that memory previously. If so, using a debug/cassert-enabled build of Postgres might help to make the behavior more reproducible. It seems to me, possible reason of that behavior could be an order of table's scanning. GIN's build algorithm prefers scan from begin to the end of table, but in 8.3 it's not always true - scan may begin from the middle or end of table depending on sequence scan's history. GIN's build algorithm could use bulk insert of ItemPointers if and only if they should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in gindatapage.c, lines 407-427) Is any way to force table's scan from the beginning? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird problem concerning tsearch functions built into postgres 8.3, assistance requested
One of the tables we're using in the 8.1.3 setups currently running includes phone numbers as a searchable field (fti_phone), with the results of a select on the field generally looking like this: 'MMM':2 '':3 'MMM-':1. MMM is the first three digits, is the fourth-seventh. The weird part is this: On the old systems running 8.1.3, I can look up a record by fti_phone using any of the three above items; first three, last four, or entire number including dash. On the new system running 8.3.1, I can do a lookup by the first three or the last four and get the results I'm after, but any attempt to do a lookup by the entire MMM- version returns no records. Parser was changed: postgres=# select * from ts_debug('123-4567'); alias | description| token | dictionaries | dictionary | lexemes ---+--+---+--++- uint | Unsigned integer | 123 | {simple} | simple | {123} int | Signed integer | -4567 | {simple} | simple | {-4567} (2 rows) postgres=# select * from ts_debug('abc-defj'); alias | description | token | dictionaries | dictionary | lexemes -+-+--++--+ asciihword | Hyphenated word, all ASCII | abc-defj | {english_stem} | english_stem | {abc-defj} hword_asciipart | Hyphenated word part, all ASCII | abc | {english_stem} | english_stem | {abc} blank | Space symbols | -| {} | | hword_asciipart | Hyphenated word part, all ASCII | defj | {english_stem} | english_stem | {defj} Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats [digit]+-[digit]+ as two separated numbers. So, you can play around pre-process texts before indexing or have a look on regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Fixed, patch attached. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ diff -c -r src.orig/backend/access/gist/gistget.c src/backend/access/gist/gistget.c *** src.orig/backend/access/gist/gistget.c 2008-10-22 12:07:39.0 +0400 --- src/backend/access/gist/gistget.c 2008-10-22 15:13:23.0 +0400 *** *** 49,55 for (offset = FirstOffsetNumber; offset = maxoff; offset = OffsetNumberNext(offset)) { ! IndexTuple ituple = (IndexTuple) PageGetItem(p, PageGetItemId(p, offset)); if (ItemPointerEquals((ituple-t_tid), iptr)) { --- 49,55 for (offset = FirstOffsetNumber; offset = maxoff; offset = OffsetNumberNext(offset)) { ! IndexTuple ituple = (IndexTuple) PageGetItem(p, PageGetItemId(p, offset)); if (ItemPointerEquals((ituple-t_tid), iptr)) { *** *** 157,163 { while( ntids maxtids so-curPageData so-nPageData ) { ! tids[ ntids ] = scan-xs_ctup.t_self = so-pageData[ so-curPageData ]; so-curPageData ++; ntids++; --- 157,167 { while( ntids maxtids so-curPageData so-nPageData ) { ! tids[ ntids ] = scan-xs_ctup.t_self = so-pageData[ so-curPageData ].heapPtr; ! ItemPointerSet((so-curpos), ! BufferGetBlockNumber(so-curbuf), ! so-pageData[ so-curPageData ].pageOffset); ! so-curPageData ++; ntids++; *** *** 251,258 { while( ntids maxtids so-curPageData so-nPageData ) { ! tids[ ntids ] = scan-xs_ctup.t_self = so-pageData[ so-curPageData ]; so-curPageData ++; ntids++; } --- 255,267 { while( ntids maxtids so-curPageData so-nPageData ) { ! tids[ ntids ] = scan-xs_ctup.t_self = ! so-pageData[ so-curPageData ].heapPtr; + ItemPointerSet((so-curpos), + BufferGetBlockNumber(so-curbuf), + so-pageData[ so-curPageData ].pageOffset); + so-curPageData ++; ntids++; } *** *** 297,309 * we can efficiently resume the index scan later. */ - ItemPointerSet((so-curpos), - BufferGetBlockNumber(so-curbuf), n); - if (!(ignore_killed_tuples ItemIdIsDead(PageGetItemId(p, n { it = (IndexTuple) PageGetItem(p, PageGetItemId(p, n)); ! so-pageData[ so-nPageData ] = it-t_tid; so-nPageData ++; } } --- 306,316 * we can efficiently resume the index scan later. */ if (!(ignore_killed_tuples ItemIdIsDead(PageGetItemId(p, n { it = (IndexTuple) PageGetItem(p, PageGetItemId(p, n)); ! so-pageData[ so-nPageData ].heapPtr = it-t_tid; ! so-pageData[ so-nPageData ].pageOffset = n; so-nPageData ++; } } diff -c -r src.orig/backend/access/gist/gistscan.c src/backend/access/gist/gistscan.c *** src.orig/backend/access/gist/gistscan.c 2008-10-22 12:07:39.0 +0400 --- src/backend/access/gist/gistscan.c 2008-10-22 14:55:58.0 +0400 *** *** 163,169 so
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
20 hours to find the fix Teodor, Kudos ! Nothing for the pride :(, my bug. Due to the importance of the fix, will we see very soon a 8.3.5 ? Don't known, see discussion. I think, that will make sense. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Thank you, I reproduce the bug, will fix. Sergey Konoplev wrote: Ok, I've done the test case (see attachment). 8.3.3 has passed it. 8.3.4 hasn't passed in ~99% times I run it. Steps to reproduce: 1. install pg 8.3.4, do initdb, start pg 2. correct PSQL parameter in pg-8.3.4_index_update_test.sh 3. run pg-8.3.4_index_update_test.sh few times And you will see something like this: ... -- 2nd time obtaining seq-scan count and plan... -- SELECT table1_flag, count(*) FROM table1 GROUP BY table1_flag; table1_flag | count -+--- 1 | 100 (1 row) EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1 GROUP BY table1_flag; QUERY PLAN --- HashAggregate (cost=15.00..15.01 rows=1 width=2) (actual time=0.140..0.140 rows=1 loops=1) - Seq Scan on table1 (cost=0.00..12.00 rows=600 width=2) (actual time=0.004..0.059 rows=100 loops=1) Total runtime: 0.172 ms (3 rows) -- 2nd time obtaining index-scan count and plan... -- SELECT count(*) FROM table1 WHERE table1_flag = 1; count --- 98 (1 row) EXPLAIN ANALYZE SELECT count(*) FROM table1 WHERE table1_flag = 1; QUERY PLAN -- Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451 rows=1 loops=1) - Index Scan using i_table1__table1_point on table1 (cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98 loops=1) Total runtime: 0.477 ms (3 rows) -- Regards, Sergey Konoplev -- PostgreSQL articles in english russian http://gray-hemp.blogspot.com/search/label/postgresql/ -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Hmm. So the problem seems to be statable as a full-index scan on a GIST index might fail to return all the rows, if the index has been modified since creation. Teodor, can you think of anything you changed recently in that area? Only fixing possible duplicates during index scan. Will see. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Hmm. So the problem seems to be statable as a full-index scan on a GIST index might fail to return all the rows, if the index has been modified since creation. Teodor, can you think of anything you changed recently in that area? I still can't reproduce the bug, but found useless recheck condition with bitmap check: drop table if exists qq; select 1 as st , 1::int4 as t into qq from generate_series(1,1) as t; create index qqidx on qq using gist (st) where t =1; INSERT INTO qq (SELECT (4 * random())::int4, (4 * random())::int4 from generate_series(1,1)); # explain select t, count(1) from qq where t =1 group by t; QUERY PLAN - GroupAggregate (cost=19.62..633.49 rows=1 width=2) - Bitmap Heap Scan on qq (cost=19.62..630.28 rows=640 width=2) Recheck Cond: (t = 1) - Bitmap Index Scan on qqidx (cost=0.00..19.46 rows=640 width=0) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using ISpell dictionary - headaches...
It *may* be because I'm using psql 8.0.3 and not the latest version (but I'm stucked with that version), i'm just hoping that one of you have met Upgrade to 8.0.17 - there was a several fixes in ISpell code. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] changing text search treatment of puncutation
In general there seem to be a lot of ways that people wish they could tweak the text search parser, and telling them to write their own parser isn't a very helpful response for most folk. I don't have an idea about how to improve the situation, but it seems like something that should be thought about. We (with Oleg) thought hard about it and we don't find a solution yet. Configurable parser should be: - fast - flexible - not error-prone - comfortable to use by non-programmer (at least for non-C programmer) It might be a table-driven state machine (just put TParserStateAction into table(s) with some caching for first step) , but it's complex to operate and it's needed to prove correctness of changes in states before its become in use. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multi-word expression full-text searching
If I understand well the plainto_tsquery behaviour, this query match with: Despite this, the president went out. Despite the event, this question arise. Right, you mean phrase search. Read the thread: http://archives.postgresql.org/pgsql-hackers/2008-05/msg0.php Suggested patch should be made as module, I think. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multi-word expression full-text searching
SELECT id FROM document WHERE to_tsvector('english',text) @@ plainto_tsquery('english','despite this'); -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fragments in tsearch2 headline
[moved to -hackers, because talk is about implementation details] I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1 (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php) Thank you. 1 diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c now contrib/tsearch2 is compatibility layer for old applications - they don't know about new features. So, this part isn't needed. 2 solution to compile function (ts_headline_with_fragments) into core, but using it only from contrib module looks very odd. So, new feature can be used only with compatibility layer for old release :) 3 headline_with_fragments() is hardcoded to use default parser, but what will be in case when configuration uses another parser? For example, for japanese language. 4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] ) and function should accept 'NumFragments=N' for default parser. Another parsers may use another options. 5 it just doesn't work correctly, because new code doesn't care of parser specific type of lexemes. contrib_regression=# select headline_with_fragments('english', 'wow asd-wow wow', 'asd', ''); headline_with_fragments -- ...wow asd-wowbasd/b-wow wow (1 row) So, I incline to use existing framework/infrastructure although it may be a subject to change. Some description: 1 ts_headline defines a correct parser to use 2 it calls hlparsetext to split text into structure suitable for both goals: find the best fragment(s) and concatenate that fragment(s) back to the text representation 3 it calls parser specific method prsheadline which works with preparsed text (parse was done in hlparsetext). Method should mark a needed words/parts/lexemes etc. 4 ts_headline glues fragments into text and returns that. We need a parser's headline method because only parser knows all about its lexemes. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] tsearch2 on-demand dictionary loading using functions in tsearch2
* Considering the database is loaded separately for each session, does this also imply that each running backend has a separate dictionary stored in memory? Yes. As for downsides, I only really see two: * Tracking updates of dictionaries - but it's reasonable to believe that new connections get open more often than the dictionary gets updated. Also, this might be easily solved by stat()-ing the dictionary file before starting up session, and only have the server reload it if there's a notified change. * Possibly complicated to implement? Keeping dictionary up to date - it's a most difficult part here. Configuration of dictionary might be done by ALTER command - so, parent process (and all currently running backends) should get that information to reload dictionary. As for my second question, is it possible to use functions in tsearch2? For example, writing my own stemmer in PL/pgSQL or in C as a postgres function. Yes, of course, you can develop your dictionary (-ies) and parser. Dut only in C, because they are critical for performance. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] tsearch2 on-demand dictionary loading using functions in tsearch2
Hmm, good point; I presume accept the fact that settings change won't propagate to other backends until reconnect would not be acceptable behavior, even if documented along with the relevant configuration option? I suppose so. That was one of the reasons to move tsearch into core and it will be too regrettable to lost that feature again. As for my second question, is it possible to use functions in tsearch2? For example, writing my own stemmer in PL/pgSQL or in C as a postgres function. I've had something different in mind. Considering there are already facilities to use functions, be it PL/pgSQL, PL/Python or C, why not just use those with the condition that the function must accept some-arguments and return some-result? Or would using this, even while using C as the language used for the actual parser, slow things down too? API to dictionary and parser intentionally utilizes complex (and nested) C-structures to decrease overheads. During parse of text postgres makes two call of parser (one call - parser returns word, second - word delimiter. Space is a lexeme too! Although it's not a subject to index) and one call of dictionary per word. So, if your language can work with C-structures then you can use that language with tsearch with more or less performance pay. PL/pgSQL hasn't this capability. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Direct access to GIST structure
encode spatial proximity. Is there an API (backend C-level is fine) to access a GIST index? The best way is to extend existing interface to GiST to support KNN-search. But you can see how to get access to index structure from module in gevel module (http://www.sigaev.ru/cvsweb/cvsweb.cgi/gevel/). GiST-related functions in this module is invented to help to developers, not for production use, so they acquire exclusive lock on index. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Direct access to GIST structure
I just stumbled on http://www.cs.purdue.edu/spgist/ which seems like exactly what I need. It doesn't work with 8.2 and up, because since 8.2 index should take care about concurrent access itself and that implementation doesn't do it. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fragments in tsearch2 headline
The patch takes into account the corner case of overlap. Here is the code for that // start check if (!startHL *currentpos = startpos) startHL = 1; The headline generation will not start until currentpos has gone past startpos. Ok You can also check how this headline function is working at my website indiankanoon.com. Some example queries are murder, freedom of speech, freedom of press etc. Looks good. Should I develop the patch for the current cvs head of postgres? I'd like to commit your patch, but if it should be: - for current HEAD - as extension of existing ts_headline. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fragments in tsearch2 headline
Teodor, Oleg, do we want this? http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php I suppose, we want it. But there are a questions/issues: - Is it needed to introduce new function? may be it will be better to add option to existing headline function. I'd like to keep current layout: ts_headline provides some common interface to headline generation. Finding and marking fragments is deal of parser's headline method and generation of exact pieces of text is made by ts_headline. - Covers may be overlapped. So, overlapped fragments will be looked odd. In any case, the patch was developed for contrib version of tsearch. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text index and most frequently used words
What I'd like to know is if there is an easy to way to use the full text index to generate a list of the most common words. I could write this code manually, but I'm hoping there's a better (simpler) way. For 8.3 http://www.postgresql.org/docs/8.3/static/textsearch-features.html#TEXTSEARCH-STATISTICS For versions before 8.3 just use stat() function instead of ts_stat(). -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] postgres 8.3 rc-1 ispell installation problem
flag *J:# isimo E -E, 'ISIMO # grand'isimo -- here 432 E-E, 'ISIMOS # grande grand'isimos E-E, 'ISIMA# grande grand'isima E-E, 'ISIMAS # grande grand'isimas O-O, 'ISIMO# tonto tont'isimo O-O, 'ISIMA# tonto tont'isima Current implementation doesn't accept any character in ending except alpha ones. i think 'I.. word is not correct for ispell, this should be one Í letter That's right, but you should convert dictionary and affix file in UTF8 encoding. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell
Fixes are committed to CVS, hope, they will help you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell
I tryed to reproduce the bug but without success. Could you provide a dump of text column? Hannes Dorbath wrote: Crash happens about 7 minutes after issuing the UPDATE statement with current CVS HEAD. The table has around 5 million rows. It's always reproducible. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Tsearch2 - spanish
prueba1=# select to_tsvector('espanol','melón perro mordelón'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ! Hmm, can you provide backtrace? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Tsearch2 - spanish
prueba=# select to_tsvector('espanol','melón'); ERROR: Affix parse error at 506 line and prueba=# select lexize('sp','melón'); lexize - {melon} (1 row) Looks very strange, can you provide list of dictionaries and configuration map? I tried many dictionaries with the same results. Also I change the codeset of files :aff and dict (from latin1 to utf8 and utf8 to iso88591) and got the same error where can I investigate for resolve about this problem? My dictionary at 506 line had: Where do you take this file? And what is encdoing/locale setting of your db? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] tsearch2 anomoly?
Usual text hasn't strict syntax rules, so parser tries to recognize most probable token. Something with '.', '-' and alnum characters is often a filename, but filename is very rare finished or started by dot. RC Gobeille wrote: Thanks and I didn't know about ts_debug, so thanks for that also. For the record, I see how to use my own processing function (e.g. dropatsymbol) to get what I need: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro .html However, can you explain the logic behind the parsing difference if I just add a .s to a string: ossdb=# select ts_debug('gallery2-httpd-2.1-conf.'); ts_debug --- (default,hword,Hyphenated word,gallery2-httpd-2,{simple},'2' 'httpd' 'gallery2' 'gallery2-httpd-2') (default,part_hword,Part of hyphenated word,gallery2,{simple},'gallery2') (default,lpart_hword,Latin part of hyphenated word,httpd,{en_stem},'httpd') (default,float,Decimal notation,2.1,{simple},'2.1') (default,lpart_hword,Latin part of hyphenated word,conf,{en_stem},'conf') (5 rows) ossdb=# select ts_debug('gallery2-httpd-2.1-conf.s'); ts_debug - (default,host,Host,gallery2-httpd-2.1-conf.s,{simple},'gallery2-httpd-2.1-c onf.s') (1 row) Thanks again, Bob On 9/6/07 11:19 AM, Oleg Bartunov [EMAIL PROTECTED] wrote: This is how default parser works. See output from select * from ts_debug('gallery2-httpd-conf'); and select * from ts_debug('httpd-2.2.3-5.src.rpm'); All token type: select * from token_type(); On Thu, 6 Sep 2007, RC Gobeille wrote: I'm having trouble understanding to_tsvector. (PostreSQL 8.1.9 contrib) In this first case converting 'gallery2-httpd-conf' makes sense to me and is exactly what I want. It looks like the entire string is indexed plus the substrings broken by '-' are indexed. ossdb=# select to_tsvector('gallery2-httpd-conf'); to_tsvector - 'conf':4 'httpd':3 'gallery2':2 'gallery2-httpd-conf':1 However, I'd expect the same to happen in the httpd example - but it does not appear to. ossdb=# select to_tsvector('httpd-2.2.3-5.src.rpm'); to_tsvector --- 'httpd-2.2.3-5.src.rpm':1 Why don't I get: 'httpd', 'src', 'rpm', 'httpd-2.2.3-5.src.rpm' ? Is this a bug or design? Thank you! Bob Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PickSplit method of 2 columns ... error
Split page algorithm was rewrited for 8.2 for multicolumn indexes and API for user-defined pickSplit function was extended to has better results with index creation. But GiST can interact with old functions - and it says about this. That isn't mean some real problem or error - index will be the same as in 8.1, not better. Kevin Neufeld wrote: Has anyone come across this error before? LOG: PickSplit method of 2 columns of index 'asset_position_lines_asset_cubespacetime_idx' doesn't support secondary split This is a multi-column GiST index on an integer and a cube (a data type from the postgres cube extension module). I traced the error to the gistUserPicksplit gistsplit_8c.html#ae6afe3060066017ec18f7d40d3f9de8 function in the gistsplit.c ... I surmise that this method is called whenever a page split is necessary. So, I know when this error occurs, but I don't know why. Thoughts anyone? Cheers, Kevin -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Regression - Query requires full scan, GIN doesn't support it
Is this a permanent limitation of GIN, or is a fix possible? Permanent. You could check user input by querytree() function --- if it returns 'T' string then fullscan will be needed. If your tsquery is produced by plainto_tsquery() call then it will not find any result, so you can show to user void page. Is a fix being worked on? If a fix is forthcoming, will it be available in the 8.2 series or only 8.3+? Possibly, full fix in 8.4. But I will not promise. 8.3 will have protection from queries which doesn't match anything. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
2007-06-01 23:00:00.001 CEST:% LOG: GIN incomplete splits=8 Just to be sure: patch fixes *creating* of WAL log, not replaying. So, primary db should be patched too. During weekend I found possible deadlock in locking protocol in GIN between concurrent UPDATE and VACUUM queries with the same GIN index involved. Strange, but I didn't see it in 8.2 and even now I can't reproduce it. It's easy to reproduce оnly on HEAD with recently added ReadBufferWithStrategy() call instead of ReadBuffer(). ReadBufferWithStrategy() call was added to implement limited-size ring of buffers for VACUUM. Nevertheless, it's a possible scenario in 8.2. Attached patch fixes that deadlock bug too. And, previous version of my patch has a mistake which is observable on CREATE INDEX .. USING GIN query. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ patch_wal_gin.v6.gz Description: Unix tar archive ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
1. After a certain point, consecutive GIN index splits cause a problem. The new RHS block numbers are consecutive from 111780+ That's newly created page. Splitted page might have any number 2. The incomplete splits stay around indefinitely after creation and we aren't trying to remove the wrong split at any point. We're either never creating an xlog record, or we are ignoring it in recovery, or we are somehow making multiple entries then not removing all of them. Agreed 3. The root seems to move, which isn't what I personally was expecting to see. It seems root refers to the highest parent involved in the split. root in this context means parent of splitted page. Actually, there is a lot of B-tree in GIN, see http://www.sigaev.ru/gin/GinStructure.pdf 4. We're writing lots of redo in between failed page splits. So *almost* everything is working correctly. 5. This starts to happen when we have very large indexes. This may be coincidental but the first relation file is fairly full (900+ MB). Yes. It seems to me that conditions of error are very rare and B-tree over ItemPointers (second level of GIN) has a big capacity, 1000+ items per page. So, splits occur rather rare. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
Ooops. Patch doesn't apply cleanly. New version. Attached patch fixes that deadlock bug too. And, previous version of my patch has a mistake which is observable on CREATE INDEX .. USING GIN query. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ patch_wal_gin.v7.gz Description: Unix tar archive ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
After some observation of massive reindexing of some hundred thousand data sets it seems to me that the slave doesn't skip checkpoints anymore. (Apart from those skipped because of the CheckpointTimeout thing) I'll keep an eye on it and report back any news on the issue. Nice, committed. Thank for your report and testing. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] warm standby server stops doing checkpoints afterawhile
2007-06-01 13:11:29.365 CEST:% DEBUG: 0: Ressource manager (13) has partial state information To me, this points clearly to there being an improperly completed action in resource manager 13. (GIN) In summary, it appears that there may be an issue with the GIN code for WAL recovery and this is effecting the Warm Standby. Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it possible reason of bug? Attached patch fixes it. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./src/backend/access/gin/ginxlog.c.orig Fri Jun 1 16:47:47 2007 --- ./src/backend/access/gin/ginxlog.c Fri Jun 1 16:53:47 2007 *** *** 594,599 --- 594,600 MemoryContextSwitchTo(topCtx); MemoryContextDelete(opCtx); + incomplete_splits = NIL; } bool ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] warm standby server stops doing checkpointsafterawhile
2007-06-01 16:28:51.708 CEST:% LOG: GIN incomplete split root:8 l:45303 r:111740 at redo CA/C8243C28 ... 2007-06-01 16:38:23.133 CEST:% LOG: GIN incomplete split root:8 l:45303 r:111740 at redo CA/C8243C28 Looks like a bug in GIN. I'll play with it. Can you provide more details about your test? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
I'd suggest we throw an error, as shown in the enclosed patch. Frank, can you give that a whirl to provide Teodor with something more to work with? Thanks. I already makes a test suite which reproduce the problem - it leaves incompleted splits. But I discover one more problem: deadlock on buffer's lock. So, right now I investigate the problem. Neither GIST nor B-tree seems to throw an error in corresponding locations also, so the potential for not being able to track this is high. I'd want to throw errors in those locations also. Agreed, I'll add more check -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
Found a reason: if parent page is fully backuped after child's split then forgetIncompleteSplit() isn't called at all. Hope, attached patch fix that. Pls, test it. PS I'm going away for weekend, so I'll not be online until Monday. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ patch_wal_gin.gz Description: Unix tar archive ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] TSEARCH2: disable stemming in indexes and triggers
I found out that using 'simple' instead of 'default' when using to_tsvector() does excactly that, but I don't know how to change my triggers and indexes to keep doing the same (using 'simple'). Suppose, your database is initialized with C locale. So, just mark simple configuration as default: # update pg_ts_cfg set locale=null where ts_name='default'; # update pg_ts_cfg set locale='C' where ts_name='simple'; If your locale setting is not C then mark needed configuration with your locale. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] opclass for real[]
ERROR: data type real[] has no default operator class for access method gist HINT: You must specify an operator class for the index or define a default operator class for the data type. There is operator class for GIN for real[]. http://www.postgresql.org/docs/8.2/static/xindex.html#XINDEX-GIN-ARRAY-STRAT-TABLE Is there a opclass defined in 8.2 or I have to create one. In either case can you please give a link for information on opclasses. Thanks Abhang ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] crash creating tsearch2 index
Could you provide a test suite? John DeSoi wrote: Hi, I'm trying to dump and restore a copy of a database in the same cluster. pg_restore would abort when creating a tsearch2 gist index. So I dumped to text removed the CREATE INDEX commands and tried to do that at the end after the rest of the database was loaded. I still have the same problem: CREATE INDEX song_tsx_title_idx ON song USING gist (tsx_title public.gist_tsvector_ops); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. This is pg 8.0.8 in a shared hosting environment, so I don't have a lot of options for tweaking. Is there a known work-around for this? Thanks, John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql 8.2.4 crash with tsearch2
Pls, check your steps or say me where I'm wrong :) If you still have a problems, I can solve it if I'll have access to your developer server... % cd PGSQL_SRC % zcat ~/tmp/tsearch_snowball_82-20070504.gz| patch -p0 % cd contrib/tsearch2 % gmake su -c 'gmake install' gmake installcheck % cd gendict % cp ~/tmp/libstemmer_c/src_c/stem_UTF_8_french.c stem.c % cp ~/tmp/libstemmer_c/src_c/stem_UTF_8_french.h stem.h % ./config.sh -n fr -s -p french_UTF_8 -v -C'Snowball stemmer for French - UTF8' % cd ../../dict_fr % gmake su -c 'gmake install' % psql contrib_regression dict_fr.sql contrib_regression=# select lexize('fr', 'sortir'), lexize('fr', 'service'), lexize('fr', 'chose'); lexize | lexize | lexize +--+ {sort} | {servic} | {chos} (1 row) contrib_regression=# select lexize('fr', 'as'); lexize {} ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] tsearch2 dictionary that indexes substrings?
My colleague who speaks more C than me came up with the code below which works fine for us. Will the memory allocated for lexeme be freed Nice, except self-defined utf8 properties. I think it will be much better to use pg_mblen(char*). In this case your dictionary will work with any supported by pgsql encodings. by the caller? Yes, of course. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] indexing array columns
you should be able to index the way you want. In contrib there a module cube which does similar to what you want to 3D, extending it to 12D shouldn't be too hard... contrib/cube module implements N dimensional cube representation -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Tsearch2 crashes my backend, ouch !
Fixed. Thanks for the report. Anyway, just to signal that tsearch2 crashes if SELECT is not granted to pg_ts_dict (other tables give a proper error message when not GRANTed).On -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] to_tsvector in 8.2.3
Sorry, no - I tested on CVS HEAD, so dll isn't compatible :( Wait a bit for 8.2.4 richardcraig wrote: Teodor As a non-C windows user (yes - throw stones at me :) ) Do you have a fixed dll for this patch that I can try? Thanks Richard Teodor Sigaev-2 wrote: Solved, see attached patch. I had found old Celeron-300 box and install Windows on it, and it was very slow :) Nope, same result with this patch. Thank you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2.orig/./wordparser/parser.c Thu Mar 22 18:39:23 2007 --- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007 *** *** 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(unsigned int*)(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding --- 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Tsearch2 crashes my backend, ouch !
which version of pgsql exactly? Listmail wrote: Hello, I have just ditched Gentoo and installed a brand new kubuntu system (was tired of the endless compiles). I have a problem with crashing tsearch2. This appeared both on Gentoo and the brand new kubuntu. I will describe all my install procedure, maybe I'm doing something wrong. Cluster is newly created and empty. initdb was done with UNICODE encoding locales. # from postgresql.conf # These settings are initialized by initdb -- they might be changed lc_messages = 'fr_FR.UTF-8' # locale for system error message strings lc_monetary = 'fr_FR.UTF-8' # locale for monetary formatting lc_numeric = 'fr_FR.UTF-8' # locale for number formatting lc_time = 'fr_FR.UTF-8' # locale for time formatting [EMAIL PROTECTED]:~$ locale LANG=fr_FR.UTF-8 LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=fr_FR.UTF-8 etc... First import needed .sql files from contrib and check that the default tsearch2 config works for English $ createdb -U postgres test $ psql -U postgres test tsearch2.sql and other contribs I use $ psql -U postgres test test=# select lexize( 'en_stem', 'flying' ); lexize {fli} test=# select to_tsvector('default', 'flying ducks'); to_tsvector -- 'fli':1 'duck':2 OK, seems to work very nicely, now install French. Since this is Kubuntu there is no source, so download source, then : - apply patch_tsearch_snowball_82 from tsearch2 website ./configure --prefix=/usr/lib/postgresql/8.2/ --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python cd contrib/tsearch2 make cd gendict (copy french stem.c and stem.h from the snowball website) ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h -C'Snowball stemmer for French' cd ../../dict_fr make clean make sudo make install Now we have : /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql '/usr/share/postgresql/8.2/contrib' /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 '/usr/lib/postgresql/8.2/lib/dict_fr.so' Okay... - download and install UTF8 french dictionaries from http://www.davidgis.fr/download/tsearch2_french_files.zip and put them in contrib directory (the files delivered by debian package ifrench are ISO8859, bleh) - import french shared libs psql -U postgres test /usr/share/postgresql/8.2/contrib/dict_fr.sql Then : test=# select lexize( 'en_stem', 'flying' ); lexize {fli} And : test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; dict_name | dict_init | dict_initoption| dict_lexize |dict_comment ---+---+--+---+- en_stem | snb_en_init(internal) | contrib/english.stop | snb_lexize(internal,internal,integer) | English Stemmer. Snowball. fr| dinit_fr(internal)| | snb_lexize(internal,internal,integer) | Snowball stemmer for French test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly BLAM ! Try something else : test=# UPDATE pg_ts_dict SET dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE dict_name = 'fr'; UPDATE 1 test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly Try other options : dict_name | fr_ispell dict_init | spell_init(internal) dict_initoption | DictFile=/usr/share/postgresql/8.2/contrib/french.dict,AffFile=/usr/share/postgresql/8.2/contrib/french.aff,StopFile=/usr/share/postgresql/8.2/contrib/french.stop dict_lexize | spell_lexize(internal,internal,integer) dict_comment| test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', 'voyageur' ); -[ RECORD 1 ]--- lexize | {travel} lexize | {voyageuse} Now it works (kinda) but stemming doesn't stem for French (since snowball is out). It should return 'voyage' (=travel) instead of 'voyageuse' (=female traveler) That's now what I want ; i want to use snowball to stem French words. I'm going to make a debug build and try to debug it, but if anyone can help, you're really, really welcome. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Tsearch2 crashes my backend, ouch !
(copy french stem.c and stem.h from the snowball website) Take french stemmer from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/stemmer/stemmer_utf8_french.tar.gz At least, it works for me. Sorry, but Snowball's interfaces are changed very quickly and unpredictable and Snowball doesn't use version mark or something similar. So, downloaded Snowball core and stemmers in different time may be incompatible :(. Our tsearch_core patch (moving tsearch into core of pgsql) solves that problem - it contains all possible snowball stemmers. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] to_tsvector in 8.2.3
Solved, see attached patch. I had found old Celeron-300 box and install Windows on it, and it was very slow :) Nope, same result with this patch. Thank you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2.orig/./wordparser/parser.c Thu Mar 22 18:39:23 2007 --- ./contrib/tsearch2/./wordparser/parser.cThu Mar 22 18:51:23 2007 *** *** 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(unsigned int*)(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding --- 117,123 { if (lc_ctype_is_c()) { ! unsigned int c = *(prs-wstr + prs-state-poschar); /* * any non-ascii symbol with multibyte encoding ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] to_tsvector in 8.2.3
I can't reproduce your problem, but I have not Windows box, can anybody reproduce that? contrib_regression=# select version(); version PostgreSQL 8.2.3 on i386-unknown-freebsd6.2, compiled by GCC gcc (GCC) 3.4.6 [FreeBSD] 20060305 (1 row) contrib_regression=# show server_encoding ; server_encoding - UTF8 (1 row) contrib_regression=# show lc_collate; lc_collate C (1 row) contrib_regression=# show lc_ctype; lc_ctype -- C (1 row) contrib_regression=# select to_tsvector('test text'); to_tsvector --- 'test':1 'text':2 (1 row) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] to_tsvector in 8.2.3
8.2 has fully rewritten text parser based on POSIX is* functions. Thomas Pundt wrote: On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote: | I can't reproduce your problem, but I have not Windows box, can anybody | reproduce that? just a guess in the wild; I once had a similar phenomen and tracked it down to a non breaking space character (0xA0). Since then I'm patching the tsearch2 lexer: --- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l +++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l @@ -78,8 +78,8 @@ /* cyrillic koi8 char */ CYRALNUM [0-9\200-\377] CYRALPHA [\200-\377] -ALPHA [a-zA-Z\200-\377] -ALNUM [0-9a-zA-Z\200-\377] +ALPHA [a-zA-Z\200-\237\241-\377] +ALNUM [0-9a-zA-Z\200-\237\241-\377] HOSTNAME ([-_[:alnum:]]+\.)+[[:alpha:]]+ @@ -307,7 +307,7 @@ return UWORD; } -[ \r\n\t]+ { +[ \240\r\n\t]+ { token = tsearch2_yytext; tokenlen = tsearch2_yyleng; return SPACE; Ciao, Thomas -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] multi terabyte fulltext searching
I'm afraid that fulltext search on multiterabytes set of documents can not be implemented on any RDBMS, at least on single box. Specialized fulltext search engines (with exact matching and time to search about one second) has practical limit near 20 millions of docs, cluster - near 100 millions. Bigger collections require engines like a google. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] multi terabyte fulltext searching
I am currently using GIST indexes because I receive about 10GB of new data a week (then again I am not deleting any information). The do not expect to be able to stop receiving text for about 5 years, so the data is not going to become static any time soon. The reason I am concerned with performance is that I am providing a search system for several newspapers since essentially the beginning of time. Many bibliographer etc would like to use this utility but if each search takes too long I am not going to be able to support many concurrent users. Use GiST and GIN indexes together: any data older than one month (which doesn't change) with GIN index and new data with GiST. And one time per month moves data from GiST to GIN. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] to_tsvector in 8.2.3
postgres=# select to_tsvector('test text'); to_tsvector --- 'test text':1 (1 row) Ok. that's related to http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit assumes any character with C locale and multibyte encoding and 0x7f is alpha. To check theory, pls, apply attached patch. If so, I'm confused, we can not assume that 0xa0 is a space symbol in any multibyte encoding, even in Windows. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./contrib/tsearch2/wordparser/parser.c.orig Wed Mar 21 20:41:23 2007 --- ./contrib/tsearch2/wordparser/parser.c Wed Mar 21 21:10:39 2007 *** *** 124,130 --- 124,134 * with C-locale is an alpha character */ if ( c 0x7f ) + { + if ( c == 0xa0 ) + return 0; return 1; + } return isalnum(0xff c); } *** *** 157,163 --- 161,171 * with C-locale is an alpha character */ if ( c 0x7f ) + { + if ( c == 0xa0 ) + return 0; return 1; + } return isalpha(0xff c); } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] tsearch2: word position
to_tsvector() could as well return the character number or a byte pointer, I could see advantages for both. But the word number makes little sense to me. Word number is used only in ranking functions. If you don't need a ranking than you could safely strip positional information. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] tsearch2: word position
Huh? I explicitly *want* positional information. But I find the word number to be less useful than a character number or a simple (byte) pointer to the position of the word in the string. Given only the word number, I have to go and parse the string again. byte offset of word is useless for ranking purpose -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] tsearch2: word position
No, the first X aren't more important, but being able to determine word proximity is very important for partial phrase matching and ranking. The closer the words, the better the match, all else being equal. exactly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] tsearch2: word position
I'm fiddling with to_tsvector() and parse() from tsearch2, trying to get the word position from those functions. I'd like to use the tsearch2 parser and stemmer, but I need to know the exact position of the word as well as the original, unstemmed word. It's not supposed usage... Why do you need that? And this only tells me a word position, not a character or byte position within the string. Is there a way to get this information from tsearch2? Have a look to headline framework as an example or staring point. hlparsetext() returns parsed text with matched lexemes in tsquery. Small description of hlparsetext is placed at http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/HOWTO-parser-tsearch2.html near the end. Description of HLWORD struct is some out of day, sorry. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Having performance problems with TSearch2
Use GIN index instead of GiST I have a table of books, with 120 registers. I have created an GIST index over the title and subtitle, -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] intarray index vs gin index
intarray. My question is whether I still should use intarray for indexing (if yes then either I should use GIST or GIN) or maybe GIN index is faster than GIST+intarray / GIN+intarray. Yes, with intarray you can use GiST/GIN indexes which you wish -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Stats collector frozen?
Apparantly there is a bug lurking somewhere in pgwin32_select(). Because if I put a #undef select right before the select in pgstat.c, the regression tests pass. May be, problem is related to fixed bug in pgwin32_waitforsinglesocket() ? WaitForMultipleObjectsEx might sleep indefinitely while waiting socket to write, so, may be there is symmetrical problem with read? Or pgwin32_select() is used for waiting write too? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Avoiding empty queries in tsearch
contrib_regression=# select numnode( plainto_tsquery('the any') ); NOTICE: query contains only stopword(s) or doesn't contain lexeme(s), ignored numnode - 0 (1 row) contrib_regression=# select numnode( plainto_tsquery('the table') ); numnode - 1 (1 row) contrib_regression=# select numnode( plainto_tsquery('long table') ); numnode - 3 (1 row) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Avoiding empty queries in tsearch
Doug Cole wrote: That sounds perfect, but it doesn't seem to exist on either of the postgresql installations I have access to (8.1 on ubuntu and fedora core). Is it new to 8.2? Is there a similar function under 8.1, or at Yes, it's new in 8.2 least a decent work-around? Thanks for the help, Doug Not nice workaround but it works: # create or replace function isvoid(tsquery) returns bool as $$ select case when $1 is NULL then 't'::bool when length(textin(tsquery_out( $1 ))) = 0 then 't'::bool else 'f'::bool end; $$ language SQL called on null input; # select isvoid( plainto_tsquery('the any') ); NOTICE: query contains only stopword(s) or doesn't contain lexeme(s), ignored isvoid t (1 row) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Tsearch2 default locale on postgres 8.2
set_curcfg() is working only for current session Tarabas (Manuel Rorarius) wrote: Hi! I am having a tsearch2 problem on postgres 8.2 again ... when I try to set the default config for tsearch2 with select set_curcfg('default'); it works fine in the same pgadmin session when i use select show_curcfg(); afterwards. The correct OID is shown. If i then close the query window and open a new one and then try the select show_curcfg(); again, it states ERROR: could not find tsearch config by locale Any idea why the configuration is not saved correctly? Best regards Manuel ... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PG 8.2.0 - TSearch2 Wrong affix file format
Send to me dict and aff file, pls By the way, tsearch2 is changed significantly, so, the better way to update is a restoring only your dataschema. Tsearch2 should be installed from contrib. Tarabas (Manuel Rorarius) wrote: Hi! I have a problem migrating my Database using TSearch2 with the UTF-8 Backport from 8.1.3 to a new database with 8.2.0. I successfully installed postgres and the TSearch2 distributed with it and copied the german.aff/german.med/german.stop and german.stop.ispell from my old postgres 8.1.3 installation to the same location in the 8.2.0 install. Then I dumped the old database with ./pg_dump database -f backup-file and restored it on the 8.2.0 database successfully without errors. I am using UTF-8 database and files for .aff/.med/.stop/.stop.ispell When I now try a TSearch2 Command like SELECT set_curdict('de_ispell'); I get the error ERROR: Wrong affix file format although the file was not changed and worked fine on the 8.1.3 Databse with the UTF-8 backport Patch from 8.2.0. Anyone have any idea how to fix the files so they will work with 8.2.0 also? The files seem to be ok and are UTF-8 encoded. Best regards Manuel ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PG 8.2.0 - TSearch2 Wrong affix file format
Affix file has artifact: PFX G N 1 . GE which is strange mix of openoffice format and ispell format. Just remove they. 8.2 Ispell code checks format strongly that in previous versions, even backported :) Tarabas (Manuel Rorarius) wrote: Hi, TS Send to me dict and aff file, pls see attached .aff file, I have not created the file from a dict myself but taken the .aff/.med/.stop/.stop.ispell from this blog: http://www.tauceti.net/roller/cetixx/category/Tipps ... TS By the way, tsearch2 is changed significantly, so, the better way to update is a TS restoring only your dataschema. Tsearch2 should be installed from contrib. That's what I did ... I used the tsearch delivered with 8.2.0 contrib for the install. Only the Schema and Data for my database was imported with the restore from the old system, so that should all be setup correctly :-) I also tested the correct tsearch2 install by removing all lines from the .aff file and the error then vanishes and the search works, but without the .aff I guess a key feature is missing :-) Best regards Manuel ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] TSearch2 Changeset 25387
Are you trying to convert openoffice (myspell) format to ispell with help of my2ispell? It seems to me, I see the problem. m2ispell doesn't convert prefixes which can not be combined with every word ('N' in myspell). So, ispell file will contain wrong line begining with PFX... I'll fixed that. Hannes Dorbath wrote: http://projects.commandprompt.com/public/pgsql/changeset/25387 Though I'm probably start going on Oleg's nerves.. :/ I'm still trying to get compound word support for my dictionaries back, while migrating from 8.1.5-gin-utf8 to 8.2. Can someone give me additional information on that change? My affix file trigger that oldFormat condition on line 472. Where is the change in affix file format documented? What has changed? Any way to convert them? I found some OpenOffice pages about it, but I failed to find what I'm looking for. IIRC I had TSearch2 with my `oldFormat' files working on an older 8.2-dev-snapshot. Thanks for any hint. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] TSearch2 Changeset 25387
Hmm, 2.0.1. But what's the difference? I don't watch changes in OpenOffice hardly. Hannes Dorbath wrote: What version of OpenOffice MySpell dictionaries is supposed to work with TSearch in 8.2? The format used till OpenOffice 2.0.1 or the format starting from 2.0.2? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] TSearch2 Changeset 25387
Oh, I see. So, only 2.0.1 and I can't change that for 8.2 branch. :( Hannes Dorbath wrote: On 21.12.2006 18:32, Teodor Sigaev wrote: Are you trying to convert openoffice (myspell) format to ispell with help of my2ispell? Yes: http://groups.google.com/group/pgsql.general/browse_thread/thread/c21872aca3754a06/3a909c0e1f05a5af I'm really unsure what someone is supposed to do, to get compound word support in 8.2 working. http://projects.commandprompt.com/public/pgsql/changeset/25387 In the comment it is stated that for German one should still use my2ispell. I had no luck with that. One the other hand: http://wiki.services.openoffice.org/wiki/Dictionaries#German_.28Germany.2C_29 tells that the new MySpell dicts, starting from OO 2.0.2, should be fine for compound word support. Thanks for your time. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Speed of postgres compared to ms sql, is this
These sorts of reports would be far more helpful if they contained some specifics. What queries does MSSQL do better than Postgres, exactly? Our OR-patch was inspired by our customer migrating from MS SQL to postgres. Next, index support of IS NULL. And, there is a huge difference in performance for queries like select * from a,b where a.f = b.f or ( a.f is null and b.f is null) NULL support is fast in MS SQL because MS SQL doesn't follow SQL standard: index in MS SQL believes that (NULL = NULL) is true. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster