Re: UdmSearch: Webboard: PostgreSQL performance
actually, PgSQL works quite effectively if you stay away from search.c and move towards something like the Perl front-end ... we switched, and the difference in response times was like going from night-day ... right now, our index database is running: %sbin/indexer -S etc/udmsearch.conf UdmSearch statistics StatusExpired Total - 0 0 6 Not indexed yet 1 0 2 Unknown status 200 0 13446 OK 301 0160 Moved Permanently 302 0 18 Moved Temporarily 304 0 89691 Not Modified 400 0 1 Bad Request 401 0 6 Unauthorized 403 0 6 Forbidden 404 0908 Not found 500 0 36 Internal Server Error 503 0 16 Service Unavailable - Total 0 104296 udmsearch=# select count(1) from ndict; count -- 12949315 (1 row) On Thu, 11 Jan 2001, gluke wrote: Author: gluke Email: [EMAIL PROTECTED] Message: If you want to use SQL backends for your search, try to use MySQL server. Our tests show that this is the fastest solution for mnoGoSearch with SQL databases. Reply: http://search.mnogo.ru/board/message.php?id=1068 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Mnogosearch-perl: $search-work_time in mod_perl ...
On Tue, 21 Nov 2000, Dubun Guillaume wrote: Hi, The Hermit Hacker wrote: what does it represent? I've set my restop to be: !--restop-- Search results: small$W/smallHR Displaying documents $f-$l of total B$t/B found. ($SearchTime secs) !--/restop-- and the resutls are coming out as: Displaying documents 1-10 of total 458 found. (1705334.945 secs) Have you the same results without modperl or with search.cgi ? What is your mnogosearch / mnogosearch-perl version ? latest version ... and the number of secs keeps increasing, as if its time since the modperl script was loaded: Displaying documents 1-10 of total 512 found. (1733825.195 secs) ~7hrs since the last time I tested, which is about right: 1733825 - 1705334 28491 28491 / 60 474 474 / 60 7 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: PgSQL: DELETE INDEX url_url;
That is so obviously not going to work ... about six or seven lines past the 'UdmGetToken()' call, there is an 'if' that sets query_words, which has to be passed to the LoadTemplate when that while() loop is finished :( Something I'd like to recommend is to take sql.c and split it off into db_mysql.c, db_pgsql.c, db_oracle.c, etc ... so that each of us on our different platforms can spend time to optimzie for that platform ... I'm loath to do this right now cause if I do, I'm bound to screw up something that another platform is dependant on ;( On Mon, 13 Nov 2000, Alexander Barkov wrote: The Hermit Hacker wrote: On Sat, 11 Nov 2000, Alexander Barkov wrote: The Hermit Hacker wrote: okay, can someone make the following changes to the source code, so that the search avoids using the index ... this will at least give a temporary fix until our LIKE optimizer is fixed: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=1971739852 AND url.rec_id=ndict.url_id AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% '); I don't think that this is the best solution to fix search for buggy LIKE optimizer then to fix search back for fixed optimizer. After sending this out, it looks like there might be a bug in udmsearch itself, as I went through the code itself, in 3.1.7, and it is technically coded to do this, but it isn't sql.c:1894: if(c-DBType==UDM_DB_PGSQL) sprintf(UDM_STREND(c-urlstr),"(url.url || '') LIKE '%s')",URL); else sprintf(UDM_STREND(c-urlstr),"url.url LIKE '%s')",URL); return(0); Any idea why this isn't, in fact, working? I found that the bug is in search.c Template is loaded after QUERY_STRING parsing. So, when URL limit are added, DBType is not known yet. You may just move this: if(LoadTemplate(Indexer,template,query_words,0)){ printf("htmlbodyCan't open template file '%s'!/body/html return(0); } before these lines: /* Parse Query String */ token=UdmGetToken(query_string,"",lasttok); while(token){ It should work OK, but I'm not sure. Maybe something in LoadTemplate() requires QUERY_STRING to be already parsed. Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: PgSQL: DELETE INDEX url_url;
On Sat, 11 Nov 2000, Alexander Barkov wrote: The Hermit Hacker wrote: okay, can someone make the following changes to the source code, so that the search avoids using the index ... this will at least give a temporary fix until our LIKE optimizer is fixed: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=1971739852 AND url.rec_id=ndict.url_id AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% '); I don't think that this is the best solution to fix search for buggy LIKE optimizer then to fix search back for fixed optimizer. After sending this out, it looks like there might be a bug in udmsearch itself, as I went through the code itself, in 3.1.7, and it is technically coded to do this, but it isn't sql.c:1894: if(c-DBType==UDM_DB_PGSQL) sprintf(UDM_STREND(c-urlstr),"(url.url || '') LIKE '%s')",URL); else sprintf(UDM_STREND(c-urlstr),"url.url LIKE '%s')",URL); return(0); Any idea why this isn't, in fact, working? My queries are coming out as the second of the two conditions, even though my 'connect string' looks like: DBAddr pgsql:[EMAIL PROTECTED]/udmsearch/ Is my 'connect string' wrong? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[2]: UdmSearch: PgSQL: DELETE INDEX url_url;
On Fri, 10 Nov 2000, Sergey Kartashoff wrote: Hi! Thursday, November 09, 2000, 9:55:11 PM, you wrote: THH On Thu, 9 Nov 2000, Alexander Barkov wrote: Don't forget to recreate this index before starting indexer. As far as url_url index is UNIQUE this does not allow indexer to add the same link several time. If you remove index, the same documents might be added several times. THH can this unique index not be based on the crc32 value instead? that might THH explain why I'm up to 140K docs when I was only expecting 91k :) no, index on crc32 cannot be unique, because of it will block adding site mirrors into url table. okay, can someone make the following changes to the source code, so that the search avoids using the index ... this will at least give a temporary fix until our LIKE optimizer is fixed: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=1971739852 AND url.rec_id=ndict.url_id AND ( (url.url || ' ') LIKE 'http://www.postgresql.org/% '); Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: PgSQL: DELETE INDEX url_url;
On Thu, 9 Nov 2000, Alexander Barkov wrote: Don't forget to recreate this index before starting indexer. As far as url_url index is UNIQUE this does not allow indexer to add the same link several time. If you remove index, the same documents might be added several times. can this unique index not be based on the crc32 value instead? that might explain why I'm up to 140K docs when I was only expecting 91k :) The Hermit Hacker wrote: Morning all ... Just a hint for anyone using udmsearch with PostgreSQL ... if you find performance is atrociuos, there is a relatively simple fix: DELETE INDEX url_url; I run the PostgreSQL.org site, and indexing it, with its mailing lists, has so far generated a database that currently looks like, and is growing: sbin/indexer -S etc/udmsearch.conf UdmSearch statistics StatusExpired Total - 0 77516 77722 Not indexed yet 200503 60508 OK 301 0104 Moved Permanently 302 0 27 Moved Temporarily 304 0 4179 Not Modified 401 0 2 Unauthorized 403 0 17 Forbidden 404 0666 Not found 500 0 26 Internal Server Error 503 10 50 Service Unavailable - Total 78029 143301 Due to a limitation in current PgSQL with its LIKE query optimizer, it "mis-optimizes" the query so that an EXPLAIN looks good, but the query itself will take several minutes to run on a Dual-PIII 450Mhz with fast drives ... Getting rid of the INDEX helps keep the numbers lower, but am looking forward to us getting the optimizer fixed :) Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[2]: UdmSearch: New message on the WebBoard #1: File HTTPsearch / FTP search CONF file Questions
thank you very much ... so much easier to follow threads when you can read them :) On Wed, 8 Nov 2000, Sergey Kartashoff wrote: Hi! Wednesday, November 08, 2000, 2:44:15 AM, you wrote: THH is it possible to get rid of the 'UdmSearch: new message on the webboard' THH part of these subjects? Yes, i shorten this subject already. -- Regards, Sergey aka gluke. Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: PgSQL: DELETE INDEX url_url;
Morning all ... Just a hint for anyone using udmsearch with PostgreSQL ... if you find performance is atrociuos, there is a relatively simple fix: DELETE INDEX url_url; I run the PostgreSQL.org site, and indexing it, with its mailing lists, has so far generated a database that currently looks like, and is growing: sbin/indexer -S etc/udmsearch.conf UdmSearch statistics StatusExpired Total - 0 77516 77722 Not indexed yet 200503 60508 OK 301 0104 Moved Permanently 302 0 27 Moved Temporarily 304 0 4179 Not Modified 401 0 2 Unauthorized 403 0 17 Forbidden 404 0666 Not found 500 0 26 Internal Server Error 503 10 50 Service Unavailable - Total 78029 143301 Due to a limitation in current PgSQL with its LIKE query optimizer, it "mis-optimizes" the query so that an EXPLAIN looks good, but the query itself will take several minutes to run on a Dual-PIII 450Mhz with fast drives ... Getting rid of the INDEX helps keep the numbers lower, but am looking forward to us getting the optimizer fixed :) Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: threads not threading?
ah, might want to add that tothe man page ... it was the first place I checked :( On Mon, 6 Nov 2000, Alexander Barkov wrote: Use "indexer -Nxxx" where xxx is a number of threads to start. The Hermit Hacker wrote: Why is it, with freebsd pthreads enabled, on a dual-CPU PIII, I never see more then one URL indexed at once? Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1998-11/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1998-12/index.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1998-12/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-01/index.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-01/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-02/index.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-02/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-03/index.html Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Can someone explain this to me?
Why, in 3.1.7, when I do a search, do I see: [14960] SQL 0.26s: SELECT word,lang FROM stopword [14960] SQL 0.14s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-1274543809 AND url.rec_id=ndict.url_id AND (url.url LIKE 'http://www.postgresql.org/%') [14960] SQL 0.01s: INSERT INTO qtrack (qwords,qtime,found) VALUES ('rpm',973462820,11) [14960] SQL 0.05s: SELECT rec_id,url,content_type,last_mod_time,title,txt,docsize,last_index_time,next_index_time,referrer,keywords,description,crc32,category FROM url WHERE rec_id IN (6,8,15,26,79,119,147,150,158,166,404) ORDER BY rec_id [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-956346810 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-1089368843 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=690675342 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=967505276 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=1672365359 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-762182779 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=268438658 AND (status=200 OR status=304) [14960] SQL 0.03s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-348977251 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-140427905 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=962713810 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-1930052569 AND (status=200 OR status=304) Why are there those extra 11 SELECT calls after we've already gotten the information above? Why not do: SELECT rec_id,url,content_type,last_mod_time,title,txt,docsize,last_index_time,next_index_time,referrer,keywords,description,crc32,category FROM url WHERE rec_id IN (6,8,15,26,79,119,147,150,158,166,404) AND (status = 200 OR status = 304) ORDER BY rec_id And do it all in one query instead of 11? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: Can someone explain this to me?
then again, if I'm only searching on one word, why the different crc32 searches? *raised eyebrow* On Sun, 5 Nov 2000, The Hermit Hacker wrote: Why, in 3.1.7, when I do a search, do I see: [14960] SQL 0.26s: SELECT word,lang FROM stopword [14960] SQL 0.14s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-1274543809 AND url.rec_id=ndict.url_id AND (url.url LIKE 'http://www.postgresql.org/%') [14960] SQL 0.01s: INSERT INTO qtrack (qwords,qtime,found) VALUES ('rpm',973462820,11) [14960] SQL 0.05s: SELECT rec_id,url,content_type,last_mod_time,title,txt,docsize,last_index_time,next_index_time,referrer,keywords,description,crc32,category FROM url WHERE rec_id IN (6,8,15,26,79,119,147,150,158,166,404) ORDER BY rec_id [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-956346810 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-1089368843 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=690675342 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=967505276 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=1672365359 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-762182779 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=268438658 AND (status=200 OR status=304) [14960] SQL 0.03s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-348977251 AND (status=200 OR status=304) [14960] SQL 0.01s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-140427905 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=962713810 AND (status=200 OR status=304) [14960] SQL 0.00s: SELECT rec_id,url,content_type,last_mod_time FROM url WHERE crc32=-1930052569 AND (status=200 OR status=304) Why are there those extra 11 SELECT calls after we've already gotten the information above? Why not do: SELECT rec_id,url,content_type,last_mod_time,title,txt,docsize,last_index_time,next_index_time,referrer,keywords,description,crc32,category FROM url WHERE rec_id IN (6,8,15,26,79,119,147,150,158,166,404) AND (status = 200 OR status = 304) ORDER BY rec_id And do it all in one query instead of 11? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: threads not threading?
Why is it, with freebsd pthreads enabled, on a dual-CPU PIII, I never see more then one URL indexed at once? Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1998-11/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1998-12/index.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1998-12/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-01/index.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-01/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-02/index.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-02/threads.html Indexer[20399]: [1] http://www.postgresql.org/mhonarc/pgsql-ports/1999-03/index.html Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: MySQL compared to Oracle8?
On Fri, 15 Sep 2000, Alain TESIO wrote: I am interested in some performance differency measurments of MySQL compared to Oracle8 when using UdmSearch. I know, the Oracle should be faster, but how much could we gain? How do you know ? You assume it's faster because you have to pay for it ? There are some benchmarks on the mysql site : http://www.mysql.com/information/benchmarks.html You may want to run your own benchmarks anyway, first because it will be yours, and then because most results in a benchmark are not significative for a search engine. Always take benchmarks with a grain of salt until those running the tests are working hand in hand with the other vendors ... they will almost always be biased towards the vendor, as they will rarely spend the time to get the tests to use features of the other vendors ... It's essentially read-only, and MySQL is very good for such an use, even if it has some drawbacks for other uses where you need concurrent read/write accesses. actually, there are features that MySQL doesn't support that would greatly help for search engines, main one being teh ability to use subselects so that you run one query to find an answer instead of one query per word being searched ... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: using subselects, anyone looking at this?
a couple of weeks back, I sent in a suggestion on how to reduce query time by what worked out to be ~1/3 when searching multiple words ... including the exact query required ... has/is anyone looking at how to implement this? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Over 9million records and slower then molasses ...
I'm running udmsearch 3.1.3 against a PostgreSQL v7.0.2 database which is on a Dual-PIII with 512Meg of RAM ... and queries are dog slow ... Looking things over, and I'm wondering if it might be an idea to reduce the string-length of the url field in the url table so that there is less of a string to do a LIKE query against ... Basically, create a new table called 'server' that contains two fields: server ( http://www.postgresql.org ) server_id ( 0 ) ... then, in the url table, have a field 'server_id' that refers to that, and strip off the 'http://www.postgresql.org' component from the 'url' field in the url table ... out of 9 million records, cutting off ~24bytes per record would greatly reduce that file, and, I think, improve the performance of the like search, such that this: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-667920895 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/mhonarc/pgsql-sql/%') would become: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-667920895 AND url.rec_id=ndict.url_id AND url.url LIKE '/mhonarc/pgsql-sql/%') AND url.server = server.id AND server.server = 'http://www.postgresql.org'; Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Potential speed up for DBs supporting subselects ...
I'm finding that the more conditions you search on, the slower things become, to the point of unusable ... looking at the logs with DEBUG turned on, for instance, if I search on '4 words', it generates four quesries ... I'm wondering if using a subselect like the following would/could be used to speed up this process: SELECT url_id,intag FROM ndict WHERE word_id=572517542 AND EXISTS ( SELECT ndict.url_id FROM ndict,url WHERE ndict.word_id=-747320868 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/mhonarc/pgsql-hackers/%')); Basically, the idea is that the first subquery reduces the number of url's that the top level has to scan to just those that satisfy the search for URLs with a word_id of -747320868 ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Over 9million records and slower then molasses ...
On Sun, 6 Aug 2000, Alain TESIO wrote: The Hermit Hacker [EMAIL PROTECTED] wrote: A simple subselect would eliminate the extra join ... Adding a subselect or the join is even regarding the performance loss, subselects can prevent the db engine from using the best indexes. As you have a big database, what about using it to test your suggestions ? I think there are bad but you may want to check. Try my suggestion with hashed url path parts, it's probably the best way to speed up a filter on url. whether you store the server url in the url table or a seperate table isn't going to matter much, I agree ... I don't agree that storing three different crc values is a good idea, but if even the url table could be modified to include a 'server' field that is searched via = vs LIKE, and remove that part of the substring from the url, that would help ... The problem is that something like PostgreSQL's LIKE condition is only significant to something like 15 characters for using an INDEX ... by removing, for example, the http://www.postgresql.org part of the url, which is the "Fixed" component for all URLs on that site, you've removed 24 characters from the LIKE search and moved the search to being *just* the part that is significant ... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: ANNOUNCE 3.1.1
This one appears to be running much better then 3.1.0 ... its actually indexing this time :) On Fri, 30 Jun 2000, Alexander Barkov wrote: Hello! UdmSearch-3.1.1 is available from our site http://search.mnogo.ru/ From ChangeLog: * Nested categories support has been added (docs coming soon). * Tag type has been changed to CHAR instead of INT. The above changes require updating database structure. (Table "url"). "Category" indexer.conf command has been added * $CP template variable to display category path has been added * $CS template variable to display current category subtree has been added * "cat" search.cgi parameter has been added to pass the category to search through * $cat template variable to display current category ID. * A bug in cp1250 and iso-8859-2 support has been fixed * Some functions moved from log.c to udmutils.c * udm_snprintf for Solaris has been added * Some bugs have been fixed -- Alexander Barkov IZHCOM, Izhevsk email:[EMAIL PROTECTED] | http://www.izhcom.ru Phone:+7 (3412) 51-23-76 | Fax: +7 (3412) 78-70-10 ICQ: 7748759 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: your mail
First suggestion: upgraded to v7.0.2 of PostgreSQL ... major difference in performance and stablity ... On Thu, 22 Jun 2000, Stanislav Kuba wrote: Hello! I have problem with udmsearch and postgresql. 1) If I use "DBMode crc-multi" or "DBMode multi" the indexer sometimes fails with error : - Indexer[7443]: [1] Error: 'pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. --- The DBMode single is OK. 2) The PHP search sometimes fails with errors : -- An error occured! Query error: SELECT flag,lang,mask,find,repl FROM affix WHERE type='s' pqReadData() -- read() failed: errno=32 Broken pipe or An error occured! Query error: SELECT flag,lang,mask,find,repl FROM affix WHERE type='s' pqFlush() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. This happens with DBMode single. 3) The db connection is opened after search. Is it OK? Can you help me? I have Linux , Postgresql 6.5.3 and udmsearch 3.0.18 and other PHP+posgtres application work OK. Stnislav Kuba __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Indexer performance limits (MySQL locking?)
On Thu, 22 Jun 2000, J C Lawrence wrote: On Thu, 22 Jun 2000 15:46:20 -0300 (ADT) The Hermit Hacker [EMAIL PROTECTED] wrote: Try setting your Period higher and using the -n option to restrict the number of pages it does in an invocation ... for instance, set Period to 1week, and -n option to 20k ... this way it only processes 20k expired pages, and they will only expire again in a week ... This creates a different problem, and is why I have the Period set low: There are certain pages that must be spidered every index. Specifically, most of my site consists of mailing list archives. The various index pages for the messages in those archives need to be spidered every time to pick up the new messages. Summary: I need a long period to prevent everything being indexed every time, but I also need certain pages to be spidered for new URLs every single time the indexer runs. How to do? that one I'm interested in as well ... I've been suffering with things so far, as I didn't/don't think there is currently a way of treating 'subpages' seperately if the toplevel page is already being indexed ... Someway of doing: Period 604800 Server http://www.postgresql.org/% Period 86400 Server http://www.postgresql.org/mhonarc/pgsql-hackers Would be great ... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: Damn, upgraded to 3.1.0 and now it won't index :(
On Wed, 21 Jun 2000, Alexander Barkov wrote: I've fixed this yesterday. This will be available in 3.1.1 soon. Some way of getting critical patches out like this needs to be implemented :( Anyway of getting anon-cvs or something going? - Original Message - From: The Hermit Hacker [EMAIL PROTECTED] To: Alexander Barkov [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, June 21, 2000 6:38 AM Subject: Damn, upgraded to 3.1.0 and now it won't index :( %sbin/indexer -n 10 etc/udmsearch.conf Indexer[35853]: indexer from UdmSearch v.3.1.0/PgSQL started with 'etc/udmsearch.conf' Indexer[35853]: [1] Done I've replaced my .conf file with the one that comes with the distribution, just in case I was missing something ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Damn, upgraded to 3.1.0 and now it won't index :(
%sbin/indexer -n 10 etc/udmsearch.conf Indexer[35853]: indexer from UdmSearch v.3.1.0/PgSQL started with 'etc/udmsearch.conf' Indexer[35853]: [1] Done I've replaced my .conf file with the one that comes with the distribution, just in case I was missing something ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Somewhat off-topic: bad performance using Postgres
okay, several questions here, first one being what version of PostgreSQL are you using? What hardware are you running on? I'm running UDMSearch 3.0.17 with PostgreSQL v7.0 on a Dual-PIII with 512Meg of RAM, with the databases sitting on an 18gig SCSI-3 hard drive ... I'm running in crc-multi mode, with a database currently taking up 1.8gig, and stats showing: %sbin/indexer -S etc/udmsearch.conf UdmSearch statistics StatusExpired Total - 200 17935 17935 OK 301 62 62 Moved Permanently 302 2 2 Moved Temporarily 304 51108 51108 Not Modified 401 1 1 Unauthorized 403 14 14 Forbidden 404344344 Not found 500 2 2 Internal Server Error 503 4 4 Service Unavailable - Total 69472 69472 And it definitely didn't take me a week to load it ... From my experience, UDMSearch isn't the most efficient in its queries, because MySQL doesn't support features that would *probably* improve it, namely the use of subqueries, but I haven't had the time to sit down and try and optimize the code, so am not 100% certain that that would help ... One thing that you might find helps is to do something like: while(1) indexer -n 5000 indexer.conf psql -c "vacuum analyze;" database end (or similar) ... so that it does periodic vacuums ... in 'near future' released of PostgreSQL, there are plans for an 'overwrite storage manager', but, for now, if you do an UPDATE on a record, it has to INSERT a new one and DELETE the old, so tables grow a little faster then you'd hope ... On Mon, 5 Jun 2000, Alain TESIO wrote: I've noticed that indexing becomes teriibly slow when using postgres as the backend (regardless of version), versus MySQL... anybody know why? Is it because PG supports transactions and more stuff than MySQL? I have noticed the same for another similar job. PostgreSQL has nice features, especially transactions (should be soon available in MySQL) With MySQL I was able to index 6 urls within a 24Hrs, using Postgres this will take significant more time - nearly a week! Anybody got suggestions? I don't know what kind of queries are executed during the indexing, if the insertions don't need indexes you could try to drop the index and recreate them when the indexing is done. Alain __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: 3.0.10 - 3.0.14: udmsearch.conf
Morning all ... Just re-installed .10 to test out whether or not I've gone crazy ... .10 works fine on my server, indexes everything ... .14 fails to instead http://www.postgresql.org at all ... Below is the conf file that I'm using on my server ... and have been pretty much since day one ... - DBPort 5432 DBHost db.hub.org DBName udmsearch DBUser pgsql DBPass DBMode crc Disallow /cgi-bin/ \.cgi /nph \? Disallow \.b$\.sh$ \.md5$ \.rpm$ Disallow \.arj$ \.tar$ \.zip$ \.tgz$ \.gz$ Disallow \.lha$ \.lzh$ \.tar\.Z$ \.rar$ \.zoo$ Disallow \.gif$ \.jpg$ \.jpeg$ \.bmp$ \.tiff$ \.xpm$ \.xbm$ Disallow \.vdo$ \.mpeg$ \.mpe$ \.mpg$ \.avi$ \.movie$ Disallow \.mid$ \.mp3$ \.rm$ \.ram$ \.wav$ \.aiff$ \.ra$ Disallow \.vrml$ \.wrl$ \.png$ Disallow \.exe$ \.cab$ \.dll$ \.bin$ \.class$ Disallow \.tex$ \.texi$ \.xls$ \.doc$ \.texinfo$ Disallow \.rtf$ \.pdf$ \.cdf$ \.ps$ Disallow \.ai$ \.eps$ \.ppt$ \.hqx$ Disallow \.cpt$ \.bms$ \.oda$ \.tcl$ Disallow \.o$ \.a$ \.la$ \.so$ \.so\.[0-9]$ Disallow \.pat$ \.pm$ \.m4$ \.am$ Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$ \?S=A$ \?S=D$ Disallow /[.]{1,2} /\%2e /\%2f Disallow [^:]// AddType text/plain \.pl$ \.js$ \.txt$ \.h$ \.c$ \.pm$ \.e$ AddType text/html \.html$ \.htm$ AddType image/x-xpixmap \.xpm$ AddType image/x-xbitmap \.xbm$ AddType image/gif \.gif$ AddType application/unknown .* # Hub's stuff Server http://www.postgresql.org/ Server http://www.hub.org/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: udmsearch .14 ...
Just upgraded to .14, and due to some work I had to do on the database server, cleared out the database and was goign to rebuild it ... Using the same config from my .10 indexer, I can't index the site anymore ... just comes back with the following. Has something changed that would affect this? pgsql sbin/indexer etc/udmsearch.conf Indexer[29711]: indexer from UdmSearch v.3.0.14/PgSQL started with 'etc/udmsearch.conf' Indexer[29711]: [1] http://www.postgresql.org/robots.txt Indexer[29711]: [1] http://www.postgresql.org/ Indexer[29711]: [1] Done Actually, something has changed, as I used to just have: Server http://www.postgresql.org which will now generate: Indexer[38871]: indexer from UdmSearch v.3.0.14/PgSQL started with 'etc/udmsearch.conf' Indexer[38871]: [1] http:///robots.txt Indexer[38871]: [1] http://www.postgresql.org Indexer[38871]: [1] Unknown host Indexer[38871]: [1] Done Where it didn't before ... but what has changed to prevent me from being able to index that site? :( __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: udmsearch .14 ...
On Fri, 5 May 2000, J C Lawrence wrote: On Fri, 5 May 2000 21:11:28 -0300 (ADT) The Hermit Hacker [EMAIL PROTECTED] wrote: Just upgraded to .14, and due to some work I had to do on the database server, cleared out the database and was goign to rebuild it ... Is your robots.txt refusing access to the Indexer? Have you edited indexer.conf to tell it to ignore robots.txt? I don't have any robots.txt :( __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: udmsearch on a hosted area (ex. hway.net)
for those in this sort of situation, PostgreSQL, Inc does offer a third-party hosting service, where we'll run the indexing for you and provide you with binaries, as applicable ... check out http://www.pgsql.com/hosting.html for more details ... On Thu, 4 May 2000, Lone Directory wrote: Is it possible to install udmsearch on a web hosting site without having telnet access? is there a special procedure? thanks blackdir __ Do You Yahoo!? Send instant messages get email alerts with Yahoo! Messenger. http://im.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Prob with 3.0.13
On Tue, 2 May 2000, Alexander Barkov wrote: Hi! - Original Message - From: Martin Ebert [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, May 02, 2000 2:38 AM Subject: UdmSearch: Prob with 3.0.13 User ist udm, database is udmsearch, no password. indexer says: # ./indexer Indexer[558]: indexer from UdmSearch v.3.0.13/PgSQL started with '/usr/local/udmsearch/etc/indexer.conf' Indexer[558]: [1] Error: 'ERROR: robots: Permission denied. Part of indexer.conf: DBAddr pgsql://udm:@localhost/udmsearch/ What's wrong? I tested again with PgSQL-6.5.3 and it does work fine on my RH-6.1 My DBAddr string was psql://test@localhost/test/ Another question: I want to play with "DBMode crc-multi". Should I create other tables? If so, how can I find a script? It is in /create/pgsql directory of UdmSearch sources. You have to use "ndictXX" tables definitions with related indexes. But the problem is that there is a strange bug in PostgreSQL-6.5.3 (previous versions are known to work fine). Postmaster crashes after a sequence of INSERTS performed by indexer. It starts work fine when "hash" index type is used instead of default "btree" with ndictXX tables. This is thing to be reported to PgSQL maling lists: strange behavour of "int4" field type indexes, but I had no time to do that yet :-( Just an FYI ... v7.0 is to be released any day now (we're just finishing cleaning up docs and that's about it) ... I'm currently running 3.0.10 on: pgsql /usr/local/udmsearch/sbin/indexer -S /usr/local/udmsearch/etc/udmsearch.conf UdmSearch statistics StatusExpired Total - 200 0 85151 OK 301 0120 Moved Permanently 401 0 1 Unauthorized 403 0 4 Forbidden 404 0581 Not found 500 0 2 Internal Server Error 503 0 6 Service Unavailable - Total 0 85865 With an ndict that has grown to: And a data/base directory of: udmsearch= select count(word_id) from ndict; count 10495461 (1 row) pgsql% du data/base/udmsearch 2637066 data/base/udmsearch And results that come back in an acceptable period of time using just the crc method ... Check her out at: http://www.postgresql.org/mhonarc/pgsql-hackers Under PgSQL, there are a few things you *have* to do to get any sort of performance though ... first one is disable-auto-wildcard ... its evil. The problem is that it creates a LIKE search of '%url%', so if you want to find all urls that *start with* 'http://www.' something, you pretty much screw up how PgSQL uses its indices ... All my 'ul' values state 'http://www.../%' ... Other then that, hoping Alexander got the LIKE query patch I submit'd awhile back into the tree, she pretty much flies ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Just some numbers ...
Morning all ... Well, UDMSearch is now fully loaded with the PostgreSQL Web Site, and a couple of other ones. The database reports itself as: udmsearch= select version(); version --- PostgreSQL 7.0.0 on i386-unknown-freebsdelf3.4, compiled by gcc 2.7.2.3 (1 row) Its actually beta5, so doesn't have Tom's LIKE fixes, only the QUERY work around for now. The dictionary (ndict) table contains: udmsearch= select count(word_id) from ndict; count --- 9695376 (1 row) And the url table contains: udmsearch= select count(rec_id) from url; count - 78943 (1 row) So not a particularly small database :) Disk space consumption, after a vacuum, is at: pgsql% du udmsearch 2238858 udmsearch The server that it is running on is a Dual-PIII 450 with 512Meg of RAM, and the database itself is sitting on an 18gig 10k RPM U2W SCSI hard drive ... and PostgreSQL is the only server running on this machine ... If I go to http://www.postgresql.org/search.cgi and search *all docs* for MVCC, it returns 565 URLs, in ~60secs, with the bulk of the search being spent in: [14180] START SQL 16535819.64s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=572517542 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/%') [14180] END SQL 52.47s All in all, not too bad ... :) Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: [HACKERS] Just some numbers ...
d'oh ... I *swore* I did a VACUUM ANALYZE before sending these in, but it turns out I only did a VACUUM :( Now, the query is down to: [40552] START SQL 16545848.04s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=572517542 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/%') [40552] END SQL 7.66s Same thing being searched (mvcc off of the URL below) ... explain: NOTICE: QUERY PLAN: Hash Join (cost=6672.12..8.80 rows=239 width=10) - Index Scan using n_word_id on ndict (cost=0.00..79566.29 rows=23916 width=6) - Hash (cost=6670.15..6670.15 rows=789 width=4) - Seq Scan on url (cost=0.00..6670.15 rows=789 width=4) Sweet :) The Hermit Hacker [EMAIL PROTECTED] writes: If I go to http://www.postgresql.org/search.cgi and search *all docs* for MVCC, it returns 565 URLs, in ~60secs, with the bulk of the search being spent in: [14180] START SQL 16535819.64s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=572517542 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/%') [14180] END SQL 52.47s What was the plan currently being generated for those, again? regards, tom lane Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Suggested patch for speed improvements under PostgreSQL ...
The following patch is meant to force PostgreSQL (including v7.0) to *not* try and use the index on a LIKE query. There is an overestimation of the speed of doing a LIKE search in PostgreSQL that values doing it before doing an equality one ... end result on my current search is that a search has to process ~60k*5million records, vs 11k*60k records to find the results ... slightly slower :) Only been tested manually so far, but is going in full bore in 5 minutes or so to test it u nder load ... *** sql.c Tue Apr 11 19:18:22 2000 --- sql2.c Fri Apr 14 11:59:09 2000 *** *** 1710,1716 __INDLIB__ int AddURLLimit(char *URL){ if(*urlstr)strcpy(UDM_STREND(urlstr)-1," OR "); elsestrcat(urlstr," AND ("); ! sprintf(UDM_STREND(urlstr),"url.url LIKE '%s')",URL); return(0); } __INDLIB__ int ClearLimits(){ --- 1710,1716 __INDLIB__ int AddURLLimit(char *URL){ if(*urlstr)strcpy(UDM_STREND(urlstr)-1," OR "); elsestrcat(urlstr," AND ("); ! sprintf(UDM_STREND(urlstr),"(url.url || '') LIKE '%s')",URL); return(0); } __INDLIB__ int ClearLimits(){ Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: [v7.0beta5] LIKE timings ...
Well, with that simple change of (url.url || ''), the results go to being much much more acceptable for a database whose ndict now contains 6million+ tuples :) [46327] START SQL 16213186.97s: SELECT word,lang FROM stopword [46327] END SQL 0.07s [46327] START SQL 16213187.04s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=572517542 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/docs/%') [46327] END SQL 1.55s [46327] START SQL 16213188.59s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=13119092 AND url.rec_id=ndict.url_id AND ((url.url || '') LIKE 'http://www.postgresql.org/docs/%') [46327] END SQL 1.71s [46327] START SQL 16213190.30s: SELECT rec_id,url,content_type,last_modified,title,txt, docsize,last_index_time,next_index_time, referrer,keywords,description,crc FROM url WHERE rec_id=14 [46327] END SQL 0.03s [46327] START SQL 16213190.33s: SELECT rec_id,url,content_type,last_modified,title,txt, docsize,last_index_time,next_index_time, referrer,keywords,description,crc FROM url WHERE rec_id=1444 [46327] END SQL 0.02s [46327] START SQL 16213190.35s: SELECT rec_id,url,content_type,last_modified,title,txt, docsize,last_index_time,next_index_time, referrer,keywords,description,crc FROM url WHERE rec_id=1598 [46327] END SQL 0.04s [46327] START SQL 16213190.39s: SELECT rec_id,url,content_type,last_modified,title,txt, docsize,last_index_time,next_index_time, referrer,keywords,description,crc FROM url WHERE rec_id=1723 [46327] END SQL 0.15s [46327] START SQL 16213190.55s: SELECT rec_id,url,content_type,last_modified FROM url WHERE crc='2aa2ad0362d064152caaed1010e6e3f9' [46327] END SQL 0.83s [46327] START SQL 16213191.38s: SELECT rec_id,url,content_type,last_modified FROM url WHERE crc='b8d5691766f8ef8c59ac42f795a0a084' [46327] END SQL 0.36s [46327] START SQL 16213191.73s: SELECT rec_id,url,content_type,last_modified FROM url WHERE crc='82f5c87cd45da6c97a25804f71e370a5' [46327] END SQL 0.35s [46327] START SQL 16213192.09s: SELECT rec_id,url,content_type,last_modified FROM url WHERE crc='80494d661646144b02fd8aed89353797' [46327] END SQL 0.35s Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Re: [HACKERS] [v7.0beta5] LIKE timings ...
On Fri, 14 Apr 2000, Tom Lane wrote: The Hermit Hacker [EMAIL PROTECTED] writes: Well, with that simple change of (url.url || ''), the results go to being much much more acceptable for a database whose ndict now contains 6million+ tuples :) So are we going to stand down from panic mode, or should I still plan on squeezing in a first-cut LIKE estimator tomorrow? I say stand down ... things appear to be working great as they are, but a more permanent fix should still be at the top of the list for v7.1, since I don't imagine that we're always going to be able to come up with that easy of a fix :) __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: A URL that isn't indexable ... ?
http://greatcircle.com/lists/majordomo-users/archive.html it indexes taht page, but none of the sub-links ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: A URL that isn't indexable ... ?
d'oh :( thanks ... On Mon, 13 Mar 2000, andy butz wrote: hi Marc, it indexes that page, but none of the sub-links ... no wonder: http://greatcircle.com/robots.txt FILE robots.txt ++ # robots.txt for http://www.greatcircle.com/ # contact [EMAIL PROTECTED] for more information User-agent: * Disallow: /firewalls/archive/ Disallow: /firewalls/mhonarc/ Disallow: /list-managers/archive/ Disallow: /list-managers/mhonarc/ Disallow: /lists/firewalls/archive/ Disallow: /lists/firewalls/mhonarc/ Disallow: /lists/list-managers/archive/ Disallow: /lists/list-managers/mhonarc/ Disallow: /lists/majordomo-announce/archive/ Disallow: /lists/majordomo-announce/mhonarc/ Disallow: /lists/majordomo-docs/archive/ Disallow: /lists/majordomo-docs/mhonarc/ Disallow: /lists/majordomo-users/archive/ Disallow: /lists/majordomo-users/mhonarc/ Disallow: /lists/majordomo-workers/archive/ Disallow: /lists/majordomo-workers/mhonarc/ The Hermit Hacker wrote: http://greatcircle.com/lists/majordomo-users/archive.html it indexes taht page, but none of the sub-links ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Its funny ...
I'm trying to figure out why udmsearch w/ PostgreSQL hurts so much ... I just tried to start a search with an ndict of ~5million tuples, and a total of about 50k documents, and I just got a 'zero sized reply' from the backend for something as simple as 'SQL' ... Looking at hte backend/server, I see tis process: 81779 ?? S 72:39.47 /home/database/v6.5.3/bin/postgres pgsql 216.126.84.1 pgsearch LOCK TABLE waiting Why is udmsearch doing any locking? With MVCC, it is not required, and just slows everything down where it doesn't need to ... :( Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Slowly, we're getting somewhere ...
Is it possible to get the PID of the CGI added to the statistics too? Now I get a bunch of lines of: SQL 0.24s: SELECT word,lang FROM stopword SQL 0.10s: SELECT word,lang FROM stopword SQL 0.11s: SELECT word,lang FROM stopword SQL 0.09s: SELECT word,lang FROM stopword SQL 0.11s: SELECT word,lang FROM stopword SQL 0.16s: SELECT word,lang FROM stopword SQL 0.09s: SELECT word,lang FROM stopword SQL 263.27s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-606704929 AND url.rec_id=ndict.url_id AND (url.url LIKE 'http://www.postgresql.org/%') SQL 0.61s: SELECT word,lang FROM stopword SQL 0.16s: SELECT word,lang FROM stopword SQL 0.32s: SELECT word,lang FROM stopword SQL 0.08s: SELECT word,lang FROM stopword SQL 0.57s: SELECT word,lang FROM stopword SQL 0.10s: SELECT word,lang FROM stopword SQL 0.16s: SELECT word,lang FROM stopword SQL 0.08s: SELECT word,lang FROM stopword SQL 0.27s: SELECT word,lang FROM stopword SQL 57.02s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-1183739328 AND url.rec_id=ndict.url_id AND (url.url LIKE 'http://www.postgresql.org/docs/%') SQL 10.09s: SELECT ndict.url_id,ndict.intag FROM ndict,url WHERE ndict.word_id=-1183739328 AND url.rec_id=ndict.url_id AND (url.url LIKE 'http://www.postgresql.org/docs/admin/%') SQL 0.12s: SELECT rec_id,url,content_type,last_modified,title,txt, docsize,last_index_time,next_index_time, referrer,keywords,description,crc FROM url WHERE rec_id=416 But haven't got a clue who belongs to what ... Also, why would I get a bunch of 'SELECT ... stopword' queries, with what appears to be no subsequent queries? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: CURSORs ...
is anyone out there workign on extending udmsearch w/PostgreSQL? performance on a PIII 450 w/ 758Meg of RAM is pretty much atrocious, and one of the things that I'm wondering is what sort of performance boast going to using CURSORs would give? Basically, instead of returning all the rows to the frontend that satisfy the search, just return the 20 rows that will be displayed ... So far as I'm able to see, udmsearch is great for small sites, but as soon as I throw any amount of documents in, it slows down to being pretty much unusable, and I'd *really* like to figure out a way of fixing that :( Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: udmsearch stable gets periodic crashes ...
Using FreeBSD 3.4-RC and PostgreSQL 6.5.0, I get the hfollowing periodically: pgsql sbin/indexer UdmSearch[58068]: indexer from UdmSearch v.2.2.1/PostgreSQL started with config /home/projects/pgsql/etc/indexer.conf UdmSearch[58068]: http://www.postgresql.org/docs/pgsql/src/test/suite/quote.sql UdmSearch[58068]: Error: Error: 'ERROR: parser: parse error at or near "world" If I run it a second time, it goes through fine... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]