UdmSearch: space in url cause error
when indexing (version 3.1.10), any URL's with a space (%20) will cause the error: Too many network errors for this server, skipped but the URL does load fine in a browser. ... Indexer[21663]: [1] http://www.co.dakota.mn.us/socialservices/chcare/COMPLAINTS.htm Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/July%2025.htm Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/November%207.html Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.dakota.mn.us/parks/ski%20pass.htm Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/November%2014.html Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/November%2021.html Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/November%2028.html Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/April%2025.htm Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/August%201.htm Indexer[21663]: [1] Too many network errors for this server, skipped Indexer[21663]: [1] http://www.co.aitkin.mn.us/board%20minutes/2000/August%204.htm Indexer[21663]: [1] Too many network errors for this server, skipped ... __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: no files found in mirror directories
--- Alexander Barkov [EMAIL PROTECTED] wrote: It tryed with your indexer.conf and everything work fine. Check that you have enough permissions to write to those directories. one of the first thing i checked. i ran indexer as root so it shouldn't matter. but yes write permission is there. would it make any diffrence if the URLs where loaded into the db using indexer -i -f urls.txt first. then i changed the indexer.conf to have the mirror settings? Caffeinate The World wrote: --- Alexander Barkov [EMAIL PROTECTED] wrote: That's strange for me. I've just checked this config and everything work fine: DBAddr mysql://foo:bar@localhost/udm/ MirrorRoot /usr/local/mnogosearch/var/mirror/ Realm http://localhost/* URL http://localhost/ i've seen url's like *.mn.us/* being indexed, but still nothing in the mirror directories. this is very odd. Caffeinate The World wrote: Mirrors command must be used BEFORE Server commands, they are per-server command, so you can use different mirror location for different sites. --- ... #MaxWordLength 32 #DeleteBad no Index yes Follow path # store a copy of each pages locally MirrorRoot /data/mnogosearch/mirror/pages MirrorHeadersRoot /data/mnogosearch/mirror/headers MirrorPeriod 6m Server site http://www.state.mn.us/ Server site http://www.mnworkforcecenter.org/ Server site http://www.exploreminnesota.com/ Server site http://www.tpt.org/ Server page http://www.gorp.com/gorp/location/mn/mn.htm Server path http://lists.rootsweb.com/index/usa/MN/ #Server site http://www.mallofamerica.com/ ... so it is before the server command. however i'm indexering from a list of URL's which may not have a server command. is there a way to mirror all URL's we index? like a 'MirrorAll yes' or something. in my case, i use a Realm *.mn.us/* to index arbitrary sites with that match. there is no way to know in advance what the server is to provide a mirror setting specifically for it. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: no files found in mirror directories
--- Zenon Panoussis [EMAIL PROTECTED] wrote: Caffeinate The World skrev: i have indexer going but i see nothing in the mirror directories. when does it store the pages to the mirror directory? If your pages are already indexed, when you re-index with -a indexer will check the headers and only download files that have been modified since the last indexing. Thus, all pages that are not modified will not be dowloaded and therefore not mirrored either. To create the mirror you need to either (a) start again with a clean database or (b) use the -m switch. i was indexing with a clean slate (sort of). i inserted about 500,000 urls into the db from an external file using 'indexer -i -f url.file'. that was before i had any mirror options in 'indexer.conf'. after all the URLs were inserted, and about 10K the URLs were indexed, i stopped indexer, editted 'indexer.conf' to add the 'mirror' options. so technically there are still 490K of URLs left that have never been indexed. so atleast some of them would get mirrored. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
--- Alexander Barkov [EMAIL PROTECTED] wrote: Caffeinate The World wrote: --- Alexander Barkov [EMAIL PROTECTED] wrote: Hello! We finally found a bug in cache.c. New version is in attachement. Everybody who has problems with splitter's crashes are welcome to test. should the 'tree' directory be removed? can we split the raw log files we have thus far or is re-indexing necessary? I hope it should work without having to remove tree directory. But better to remove it. It is safe to use old /raw and /splitter files without having to reindex. ok. what exactly was the bug? __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
--- Alexander Barkov [EMAIL PROTECTED] wrote: Hello! We finally found a bug in cache.c. New version is in attachement. Everybody who has problems with splitter's crashes are welcome to test. should the 'tree' directory be removed? can we split the raw log files we have thus far or is re-indexing necessary? __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
i didn't get this error on my NetBSD/Alpha. compile was fine. what system are you on? --- Zenon Panoussis [EMAIL PROTECTED] wrote: Alexander Barkov skrev: We finally found a bug in cache.c. New version is in attachement. Everybody who has problems with splitter's crashes are welcome to test. Please, give feedback! Oops. Something else is not OK: cache.c:687:87: warning: #ifdef with no argument cache.c:692:87: warning: #ifdef with no argument cache.c:697:87: warning: #ifdef with no argument cache.c:702:87: warning: #ifdef with no argument cache.c: In function `UdmFindCache': cache.c:969: parse error before `?' cache.c:982: `real_num' undeclared (first use in this function) cache.c:982: (Each undeclared identifier is reported only once cache.c:982: for each function it appears in.) cache.c:994: `fd1' undeclared (first use in this function) cache.c:996: `group' undeclared (first use in this function) cache.c:1000: `group_num' undeclared (first use in this function) cache.c: At top level: cache.c:1011: initializer element is not constant cache.c:1011: warning: data definition has no type or storage class cache.c:1012: parse error before string constant cache.c:1013: parse error before string constant cache.c:1013: warning: data definition has no type or storage class cache.c:1014: redefinition of `ticks' cache.c:1011: `ticks' previously defined here cache.c:1014: initializer element is not constant cache.c:1014: warning: data definition has no type or storage class cache.c:1015: parse error before string constant cache.c:1015: warning: data definition has no type or storage class cache.c:1024: `i' undeclared here (not in a function) cache.c:1024: parse error before `.' cache.c:1030: register name not specified for `p' cache.c:1032: parse error before `if' cache.c:1035: `pmerg' undeclared here (not in a function) cache.c:1035: `pmerg' undeclared here (not in a function) cache.c:1035: warning: data definition has no type or storage class cache.c:1036: parse error before `' cache.c:1043: `k' undeclared here (not in a function) cache.c:1043: warning: data definition has no type or storage class cache.c:1044: parse error before `}' cache.c:1046: conflicting types for `p' cache.c:1030: previous declaration of `p' cache.c:1046: `pmerg' undeclared here (not in a function) cache.c:1046: warning: data definition has no type or storage class cache.c:1047: parse error before `' cache.c:1048: parse error before `-' cache.c:1058: warning: initialization makes integer from pointer without a cast cache.c:1058: warning: data definition has no type or storage class cache.c:1058: parse error before `}' cache.c:1061: redefinition of `ticks' cache.c:1014: `ticks' previously defined here cache.c:1061: initializer element is not constant cache.c:1061: warning: data definition has no type or storage class cache.c:1063: parse error before string constant cache.c:1071: warning: parameter names (without types) in function declaration cache.c:1071: conflicting types for `UdmGroupByURL' ../include/udm_searchtool.h:7: previous declaration of `UdmGroupByURL' cache.c:1071: warning: data definition has no type or storage class cache.c:1072: parse error before `}' make[1]: *** [cache.lo] Error 1 make[1]: Leaving directory `/root/mnogosearch-3.1.10/src' make: *** [all-recursive] Error 1 -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: no files found in mirror directories
i'm trying to store all web pages locally so i don't have to go fetch them on the internet each time i re-index. i have indexer going but i see nothing in the mirror directories. when does it store the pages to the mirror directory? # grep Mirror indexer.conf MirrorRoot /data/mnogosearch/mirror/pages MirrorHeadersRoot /data/mnogosearch/mirror/headers MirrorPeriod 6m # ls -l /data/mnogosearch/mirror/* /data/mnogosearch/mirror/headers: /data/mnogosearch/mirror/pages: __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
--- Alexander Barkov [EMAIL PROTECTED] wrote: Alexander Barkov wrote: i completely forgot about this feature!!! i read about it when i first started using mnogosearch, but never bothered to use it. with mirror feature, wouldn't it be easy to implement Google's "cache" feature where the user can view a cache of the page from the last time you indexed. I think it's possible. Moreover, we may use zlib to compress those files, so they'll use less space. The only one disadvantage is that it will not work on huge search engines with millions documents. There is a limit on total file number on file system in most unixes. For example, my 30G /usr partition on FreeBSD box can create about 8 mln files. is that a per file system limit or per unix box limit? __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: no files found in mirror directories
--- Alexander Barkov [EMAIL PROTECTED] wrote: That's strange for me. I've just checked this config and everything work fine: DBAddr mysql://foo:bar@localhost/udm/ MirrorRoot /usr/local/mnogosearch/var/mirror/ Realm http://localhost/* URL http://localhost/ it's not working on my system cause it hasn't index the urls with *.mn.us/* yet. it's indexing other url's that was fed to it via an external list (indexer -i -f url_list.txt). some of those url's in that list doesn't fit the pattern '*.mn.us/*'. i could add 'Realm *' to get it to mirror any site, but that would tell indexer to follow and index anything which is not what i want. what i'm looking for is some parameter like DeleteNoServer but for mirroring. where it would mirror all URLs already in the db or fed to it by an external list. Caffeinate The World wrote: Mirrors command must be used BEFORE Server commands, they are per-server command, so you can use different mirror location for different sites. --- ... #MaxWordLength 32 #DeleteBad no Index yes Follow path # store a copy of each pages locally MirrorRoot /data/mnogosearch/mirror/pages MirrorHeadersRoot /data/mnogosearch/mirror/headers MirrorPeriod 6m Server site http://www.state.mn.us/ Server site http://www.mnworkforcecenter.org/ Server site http://www.exploreminnesota.com/ Server site http://www.tpt.org/ Server page http://www.gorp.com/gorp/location/mn/mn.htm Server path http://lists.rootsweb.com/index/usa/MN/ #Server site http://www.mallofamerica.com/ ... so it is before the server command. however i'm indexering from a list of URL's which may not have a server command. is there a way to mirror all URL's we index? like a 'MirrorAll yes' or something. in my case, i use a Realm *.mn.us/* to index arbitrary sites with that match. there is no way to know in advance what the server is to provide a mirror setting specifically for it. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
--- Alexander Barkov [EMAIL PROTECTED] wrote: Caffeinate The World wrote: The only one disadvantage is that it will not work on huge search engines with millions documents. There is a limit on total file number on file system in most unixes. For example, my 30G /usr partition on FreeBSD box can create about 8 mln files. is that a per file system limit or per unix box limit? Per file system limit. couldn't you do something like mount multiple FS: sd0a /data/part1 sd1a /data/part2 ... sdna /data/partn wouldn't that work? __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: no files found in mirror directories
--- Alexander Barkov [EMAIL PROTECTED] wrote: That's strange for me. I've just checked this config and everything work fine: DBAddr mysql://foo:bar@localhost/udm/ MirrorRoot /usr/local/mnogosearch/var/mirror/ Realm http://localhost/* URL http://localhost/ i've seen url's like *.mn.us/* being indexed, but still nothing in the mirror directories. this is very odd. Caffeinate The World wrote: Mirrors command must be used BEFORE Server commands, they are per-server command, so you can use different mirror location for different sites. --- ... #MaxWordLength 32 #DeleteBad no Index yes Follow path # store a copy of each pages locally MirrorRoot /data/mnogosearch/mirror/pages MirrorHeadersRoot /data/mnogosearch/mirror/headers MirrorPeriod 6m Server site http://www.state.mn.us/ Server site http://www.mnworkforcecenter.org/ Server site http://www.exploreminnesota.com/ Server site http://www.tpt.org/ Server page http://www.gorp.com/gorp/location/mn/mn.htm Server path http://lists.rootsweb.com/index/usa/MN/ #Server site http://www.mallofamerica.com/ ... so it is before the server command. however i'm indexering from a list of URL's which may not have a server command. is there a way to mirror all URL's we index? like a 'MirrorAll yes' or something. in my case, i use a Realm *.mn.us/* to index arbitrary sites with that match. there is no way to know in advance what the server is to provide a mirror setting specifically for it. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: 3.1.10 Won't Make or Make install
--- Adrift [EMAIL PROTECTED] wrote: Author: Adrift Email: [EMAIL PROTECTED] Message: every version of mysql I have installed worked perfectly, that is the install ran smoothly (I am using FreeBSD 3.4). When I tried to "MAKE" the new version of mnogosearch, 3.1.10, I got the error: Making all in src "Makefile", line 390: Need an operator what does line 390 look like? __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: CAN I SEND SIGNAL 'TERM' TO INDEXER PROGRAM?
--- Anonymous [EMAIL PROTECTED] wrote: Author: pokistu Email: Message: Actually I am running indexer, but it is 'eating' a lot of system memory. I want to know if i can stop with the 'term' signal (linux) the indexer program, and the DATABASE will NOT corrupt. i do it and i've not noticed any problems. i run postgresql. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: cache mode and table dict
is the table 'dict' used at all in cache mode? mine doesn't have any records. __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
i've been going through this and back again time and time again. what would really be nice is indexer save the logs in a format that's easy to use again. for instance, you can use the format re-index to sql etc. or if you want to reindex again, you don't have to crawl through all the external websites. saves a lot of time and we can debug faster. --- Zenon Panoussis [EMAIL PROTECTED] wrote: Zenon Panoussis skrev: Now for 31 MB adventures :) # ./run-splitter -k Sending -HUP signal to cachelogd... Done # ./run-splitter -p Preparing logs... Open dir '/var/mnogo3110/raw' Preparing word log 982024900 [ 42176 bytes] Preparing word log 982027284 [31465324 bytes] Preparing word log 982027618 [ 8815804 bytes] Preparing del log 982024900 Preparing del log 982027284 Preparing del log 982027618 Renaming logs... Done Running ./run-splitter on these worked fine. No problems at all. After that, I went on indexing and created 59920 Feb 13 06:05 982040748.del.done 31457740 Feb 13 06:05 982040748.wrd.done 1480 Feb 13 06:06 982040807.del.done 637240 Feb 13 06:06 982040807.wrd.done 51920 Feb 13 07:21 982045300.del.done 31469304 Feb 13 07:21 982045300.wrd.done 69248 Feb 13 07:51 982047843.del.done 30213344 Feb 13 07:51 982047843.wrd.done another two 31 MB files and two smaller ones. All of them were splitted without problems. [two days later] Indexing kept crashing (see separate posting) and splitting kept going fine until tonight, when the opposite occured. By now, I have almost 1 GB of indexed files, 4 indexer crashes and one splitter crash. I'll do the debugging and post its output tomorrow. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
in my tests your 3 little files wouldn't make a difference. he would have to run splitter -p and splitter on all the files starting from the first original RAW file, including all the 31 MB file. i believe in my case it was the original 31mb file which caused the problem. while processing the first 31mb file, it didn't core dump, but all the preceeding files did cause core dumps at unpredictable times, but often at the same location initially (i.e. 77C3000...) therefore, in order to recreate the scenario, one would have to start from the first raw file. i've tar-ed up such a series of file for Alex. perhaps he'll be able to find out why. my hypothesis is an array or buffer overflow in splitter.c. --- Zenon Panoussis [EMAIL PROTECTED] wrote: Alexander Barkov skrev: Can you guys give us a log file produced by splitter -p which caused crash? We can't reproduce crash :-( Huh? splitter doesn't accept the -v5 argument, so it won't give more detailed logs than the normal ones. The only log I had, that to stdout, is the one I included with my first posting in this thread: Delete from cache-file /var/mnogo319/tree/12/B/12BFD000 /var/mnogo319/tree/12/C/12C1 old: 69 new: 1 total: 70 ./run-splitter: line 118: 18790 Segmentation fault (core dumped) $SPLITTER Until this point everything was normal. Anyway, as I said, I strongly suspect corruption in the word database. On a previous occasion when this happened, I deleted the entire tree/* directory structure and started all over again. Splitter worked like a dream with both small and big log files until one of the following occured: 1. I stopped indexer with ^C and then run splitter or 2. Splitter had to work itself through some 31 MB files. (These files are not all the same size; they tend to get slightly bigger the more they are, i.e. something like this: 0001.log31.500.000 bytes 0002.log31.550.000 bytes 0003.log31.580.000 bytes sort of). Unfortunately I haven't been making notes, so I can't tell for sure which one of these two things happened before things stopped working. I tried splitter again today with ./splitter splitter.log . It went in a very normal way *almost* as far as yesterday, and then hang so badly that not even kill -9 could kill it. The log of this run looks like snip normal operation Delete from cache-file /var/mnogo319/tree/12/B/12B27000 Delete from cache-file /var/mnogo319/tree/12/B/12B2D000 Delete from cache-file /var/mnogo319/tree/12/B/12B3 Delete from cache-file /var/mnogo319/tree/12/B/12B31000 Delete from cache-file /var/mnogo319/tree/12/B/12B3 I am attaching the three files that could be involved, namely tree/12/B/12B31000, 12B32000 and 12B35000. I'll install 3.1.10 now, try it on the old word database and see what it does. If it doesn't work, I'll remove the word database and start again from scratch. I'll try to make detailed notes this time and report back. Z -- oracle@everywhere: The ephemeral source of the eternal truth... ATTACHMENT part 2 application/x-gzip name=wordfiles.tar.gz __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: This is SHITE!!!
you can try http://aspseek.com i think that's the other one based on mnogo. or was it aspsearch.com? argh i forgot. there is also htdig. check em out. as far as mnogo.. no one is getting paid for development here. people spend their time coding and releasing it for free. yes i agree, the docs could use some help, but then English aren't these guys' native tongue. in addition, if anyone want to write the docs, by all means no one is stopping you from contributing. i know at times it can be frustrating when things don't seem to go right. but you have to be patient because there are many bugs, many platforms, variations of OSes, etc. i've been describing a bug in cache mode for over a month, and they are working on tracking it down. it's hard when i see it on my system but they can't reproduce it. as of late, a few others cited the same bug in cache mode. see, i've been patient for over a month. during that time, i've tried different scenarios to help narrow down the problem. my point is, i'm not paying these guys, and so i don't really have any rights to whine about it. the best i can do is to try to help with what ever effort i can contribute. but if i can't, i just try to be patient while they do their work. posting a comment such as yours won't help get things working for you. in fact, it may hinder your progress and others as it may piss them off. though we all get frustrated at times, but try to describe your problems in a detailed manner, and be patient. alex, serge, and others have been more than generous with their time and efforts, but like everything else in life that's free, there is no guarantee. --- Anonymous [EMAIL PROTECTED] wrote: Author: Joe B Email: Message: Hello ALL, After spending nearly 3 Days trying to get this thing to work, I have come to the conclusion that it is a waste of time and a JOKE:-(..) The documentation is poor and the support I am getting from this board is daft. Does anyone else no of any alternative? If so please let me know. See my postings below to see problems I have been having and the replies i get and you will see why I am feeling this way. Reply: http://search.mnogo.ru/board/message.php?id=1335 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Segfault (grrr)
i reported this problems a while back. i believe it's being worked on. atleast the recently found the bug why it wasn't splitting out to FFF. the seg fault happens during the splitter process and not index. i've been splitter when the logs are at about 2 MB and i've not had splitter core dump on me yet. but before when i let the log file build up to about 15 to 30 MB, i had that core dump problem. i hope this will be resolved soon because it's a pain in the behind. ;-( --- Zenon Panoussis [EMAIL PROTECTED] wrote: Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: RH Linux 7.0, search 3.1.9, MySQL 3.23.29, cache mode, with the new patches for cache.c and sql.c. It happens all the time. It started happening when "maximum size" 31 MB log files were indexed, but by now it happens on any indexing, no matter how big or small the log file, as if the database somehow was corrupt: Delete from cache-file /var/mnogo319/tree/12/B/12BFD000 /var/mnogo319/tree/12/C/12C1 old: 69 new: 1 total: 70 ./run-splitter: line 118: 18790 Segmentation fault (core dumped) $SPLITTER For the same log file it always crashes at the same index file (e.g. every time I try to reindex 12345678.log it will crash at tree/12/3/4567000). If I delete the log file and start again with a new log file, it will crash at a different place, but it will still be consistent in crashing at the same place every time. And the backtrace: # gdb splitter core GNU gdb 5.0 [...] This GDB was configured as "i386-redhat-linux"... Core was generated by `/usr/local/mnogo319/sbin/splitter'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done. Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x8059061 in UdmSplitCacheLog (log=300) at cache.c:552 552 logwords[count+j].wrd_id=table[w].wrd_id; (gdb) backtrace #0 0x8059061 in UdmSplitCacheLog (log=300) at cache.c:552 #1 0x8049e89 in main (argc=1, argv=0xba94) at splitter.c:70 #2 0x4009bbfc in __libc_start_main (main=0x8049d80 main, argc=1, ubp_av=0xba94, init=0x80495bc _init, fini=0x8065b7c _fini, rtld_fini=0x4000d674 _dl_fini, stack_end=0xba8c) at ../sysdeps/generic/libc-start.c:118 Since 3.1.10 is coming out today, I'll try it and see if things work better. If not, I'll post more bad news later ;) Z Reply: http://search.mnogo.ru/board/message.php?id=1320 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Splitter: core dumped
i've had this problem since they implemented cache mode. i've written about it several times (in detail). however, it appears that no one knows what it is. at first i thought it was my alpha 'til your email. maybe alex or serge can help. i've also provided back traces as well. i'll wait. for now, i'm indexing but running splitter when the files are around 2MB. --- Zenon Panoussis [EMAIL PROTECTED] wrote: Caffeinate The World skrev: I run splitter -p and finish fine. I then run splitter and, halfway through the splitting, crash: segmentation fault, or just a hang, core dumped. So I restart splitter and next time finish fine. what machine are you on? Alpha? OS? Intel PII, RH Linux 7.0 with 2.2 kernet. i had the same problem and i sent a message to the mailing list describing how i corrected it. search for "core" and "splitter" Found it. My dump appeared at a different position than yours, at 076, but was just as persistent at yours. Also, the premises are similar: I had run indexer for a long time and I had five 31 MB files waiting to be split. Splitter choked every time on the third one of them. This has never happened before or after when the logs have been smaller than 31 MB, so I'm just re-running smaller chunks at a time. can you check another thing? i've never seen my splitter split the lasta file "FFF.log". do you get that file? it goes as high as FFE.log only. Indeed, last night I saw it stop at FFE.log . But I have had files at tree/FF/F/... , so I assume that other times it went all the way to FFF. Z -- oracle@everywhere: The ephemeral source of the eternal truth... __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: slow indexing
--- Alexander Barkov [EMAIL PROTECTED] wrote: Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: I have a question. I have a servertable of about 20,000 urls, I was wandering if that was what my performance bottleneck is. It seems that Indexer takes all the cpu time on my machine now, and only indexes about 20 urls per 10 minutes. Out of those 20,000 servers, I have 690K documents. My indexer.conf file is basically: allownocase *.htm *.html *.pl Robots yes DeleteNoServer no Deletebad yes follow world hops 3 Dbmode crc-multi servertable server It is known problem. We think how to solve it. For now use Realm command where it is possible. It allows to describe several sites or even domains using the only one command. Reply: http://search.mnogo.ru/board/message.php?id=1279 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] when i had Realm set to *, it just flies!! but i also ended up getting urls i didn't want. __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Splitter: core dumped
i have a gig of ram here on my alpha. the only thing that have helped me was to change size_t to u_int32_t in cache.c and size_t to 'unsigned int' in cachelogd.c. however the size is a 64bit related problem with the alpha. but with those changes and splitting in smaller increments, i keep it below 5mb file before i splitter it. things are fine. well with the exception that i NEVER see FFF.log. so i think there is something that's not right with the calculations. i've indexed over a million URLs at one point and never once got the FFF.log. --- Zenon Panoussis [EMAIL PROTECTED] wrote: Caffeinate The World skrev: i'll wait. for now, i'm indexing but running splitter when the files are around 2MB. I've been running indexer -c 3600 since last night, producing log files of 5-10 MB and running splitter every time afterwards, with cleaning of var/splitter and all. So far no problems at all. I have a hunch that the problem is to splitting multiple big files in one go. A friend offered to lend me some memory. If I can get my ass over there and fetch it, I'll try a huge splitting first with my standard 128 MB RAM and then with 1 GB RAM. If there is any difference in the behaviour of splitter, it will be a good indication of where to look for the problem. Z -- oracle@everywhere: The ephemeral source of the eternal truth.. __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Splitter: core dumped
what machine are you on? Alpha? OS? i had the same problem and i sent a message to the mailing list describing how i corrected it. search for "core" and "splitter" can you check another thing? i've never seen my splitter split the lasta file "FFF.log". do you get that file? it goes as high as FFE.log only. --- Zenon Panoussis [EMAIL PROTECTED] wrote: Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: [3.1.9, cache mode] I run splitter -p and finish fine. I then run splitter and, halfway through the splitting, crash: segmentation fault, or just a hang, core dumped. So I restart splitter and next time finish fine. The question is: what can this do to the word database? Will it still be accurate, or will some words be inserted twice? Can I just re-run and finish and be happy, or should I re-index? Z Reply: http://search.mnogo.ru/board/message.php?id=1271 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[8]: UdmSearch: php-mnogo
--- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Friday, February 02, 2001, 8:57:28 AM, you wrote: CTW i modified libtool a bit and it compiled and apache didn't complain. CTW i'll try making a sharedlib later. but upon testing it. it's VERY FAST. I have a question: have you compiled threaded version of mnogosearch or not ? no i've not. i'm on netbsd, we don't have native threads yet. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Cache mode questions
http://search.freewinds.cx/cgi-bin/search2.cgi --- Alexander Barkov [EMAIL PROTECTED] wrote: What "New search" do you mean guys? I can't find it on this page. Caffeinate The World wrote: oops ignore my last post, i forgot to use New Search. yes you are right. wow. yikes i mean. i don't know what's going on there. i don't think substring search is supported in cache mode. --- Zenon Panoussis [EMAIL PROTECTED] wrote: Author: Zenon Panoussis Email: [EMAIL PROTECTED] Message: The search works very nicely, but it returns a tremendous amount of quoted document data... Can I take a look on your search page? Yes. Go to http://search.freewinds.cx and use "New search". Search for the word "something" and format "Long" and you'll get a results page that's almost half a megabyte. BTW, there is some other strange behaviour there. Searching for beginning of word or substring doesn't work at all. Ispell is not enabled, but as I understand it doesn't need to be either. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: shared lib uses wrong path
when using: --enable-shared all client programs of mnogosearch looks for their library in ".libs" instead of "$PREFIX/lib" # ./search.cgi Cannot open ".libs/libudmsearch.so" # indexer -h Cannot open ".libs/libudmsearch.so" __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: php-mnogo
--- Alexander Barkov [EMAIL PROTECTED] wrote: Does search.cgi work fine with "Minneapolis Elected Officials" ? no search.cgi doesn't. i appears as though pluralized words aren't searched properly. ie. "official" would work, but not "officials". in addition, any capitalization will not work. i do have "IspellMode db" and "StopwordTable stopword" set. seems like some problems with suffix and ispell mode. i'm using 3.1.9. Caffeinate The World wrote: i modified libtool a bit and it compiled and apache didn't complain. i'll try making a sharedlib later. but upon testing it. it's VERY FAST. i'm using cache mode and it's a few folds faster than the CGI version. also my db is pgsql. when i say fast, i mean REALLY REALLY FAST. and this is with a server load of about 4.. average load is about 1 or .90. i'm running serveral indexers now. something that is strange is this: http://search.minnesota.com/test.php search for these words: city council minneapolis official first entry will be: ---cut--- 1. http://www.ci.minneapolis.mn.us/citywork/elected.html CONT : text/html TITLE: Minneapolis Elected Officials KEYWORDS: Minneapolis, City of Minneapolis, Minnesota, Twin Cities, City of Lakes, City Government, MN, MPLS, Municipal Government, Municipality, Local Government, Govern DESC: This is the official web site for the City of Minneapolis, Minnesota, USA. As a round-the-clock ser TEXT: Minneapolis Elected Officials View the City of Minneapolis Goals 1999 Goals 2000 Goals 2001 GoalsMayor Sharon Sayles BeltonCouncil Members About the City Council (roles and responsibilities)Ward 1 - Paul Ostrow Ward 2 - Joan SIZE : 9456 MODIFIED : 979051812 URLID : 517899 SCORE : 4 ---/cut--- look at the line "TEXT:" now if you use "Minneapolis Elected Officials" as your new search words, it will return 0 documents found. why? one thing to note is i had to wipe out my ./var/tree, keep the URLs in the db, expire all of the URLs, and 'indexer -m' to reindex them. this process is VERY slow.. it seems as though it's 20 times slower than when i initially started w/o any URLs in the DB yet, just "server" commands. currently there are about 1/2 million urls in there, about 10,000 has been indexed. --- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Friday, February 02, 2001, 8:17:21 AM, you wrote: CTW i took out "-ludmsearch" from LIBS. recompiled: CTW those functions are still Undefined. for some reason the warnings seem CTW ti indicate that it's looking for a shared libudmsearch.so? ok, we will discuss about this problem. Maybe this is because of you are using -export-dynamic in your ldflags. Anyway, you can try to compile/install libudmsearch as shares library by using --enable-shared configure switch whhile configuring mnogosearch. Try reinstall it as shared library and reconfigure/recompile/reinstall php. -- Regards, Sergey aka gluke. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[2]: UdmSearch: php-mnogo-0.6
--- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Friday, February 02, 2001, 6:20:39 PM, you wrote: CTW this doesn't yet support ispell suffix or prefix mode does it? No, it will be done soon. CTW maybe this is why searches fail on pluralized words. also, search will CTW fail on any words with one or more letters capitalized. This is strange. Have you setup UDM_PARAM_CHARSET correctly ? Udm_Set_Agent_Param($udm,UDM_PARAM_CHARSET,"iso-8859-1"); btw, what is the default charset if you don't specify any in for indexer.conf? __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[4]: UdmSearch: php-mnogo-0.6
--- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Friday, February 02, 2001, 7:23:58 PM, you wrote: CTW maybe this is why searches fail on pluralized words. also, search will CTW fail on any words with one or more letters capitalized. This is strange. Have you setup UDM_PARAM_CHARSET correctly ? CTW Udm_Set_Agent_Param($udm,UDM_PARAM_CHARSET,"iso-8859-1"); ok, i checked it. It is really the bug. (With not finding zapitalized words). I will try to fix it. Thank you ! glad you hear! good job! if you don't mind my ranting, i'm here to find bugs ;-) just don't let my abundance of postings get to you. thanks and keep up the wonderful work. btw, it must be rather late in your country now? it's noon here. sleep does the body good. i tried it last night, and i feel better today. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
solution (Re: UdmSearch: splitter core dump)
--- Alexander Barkov [EMAIL PROTECTED] wrote: No, unfortunately we haven't found bug yet :-( Your last debug information should help. it appears i was right about size_t causing the problem on Alpha. Again, sizeof(size_t) on the Alpha is 8, while on i386 it's 4. i'm not exactly sure what contributed to the coredump, but my feeling is that because of the difference in size, cachelogd wrote the wrong records to disk only for certain words. the size difference also affects splitter. with this bug, record 77C.log always had the largest size. again, i'm not sure why. why most other records were less than 10K for each splitter time, 77C.log ranged from 100K to over 400K. i started over from scratch by re-indexing everything. i tried using the changed splitter and cachelogd on the existing ./var/tree data, but it caused more core dumps, not at 77C but at other locations. i believe the existing data were tainted, and therefore when checked by splitter for comparison or delete processes, it core dumped. but before that i made some changes to "cache.c", "cachelogd.c". in "cache.c" i replaced all occurances of "size_t" with "u_int32_t". for "cachelogd.c" i replaced all "size_t" with "unsigned int". please note that replacing "size_t" with "u_int32_t" for "cachelogd.c" will result in extremely high and always increasing server load. mine went from 1 to over 36 for server load after trying that. after erasing ./var/tree (is there a faster way than rm -rf) and starting the new cachelogd, i started indexer. i've been running it for 3 days. i've used splitter 4 times and i've yet to get a core dump. i've tested this on raw data of about 2mb or less. i've not let it climb to 30mb like before. i'll do that soon here, but the indexing process is extremely slow (4 indexers running, not threaded). maybe it's because of the 1/2 million expired urls in pgsql's db. Caffeinate The World wrote: hi alex, could you let me know if you found anything and if you have a patch for 3.1.9pre13. i have indexers still going and just building up files and i can't splitter them large files unless i attend to the computer and watch the size of the logs. thanks. --- Alexander Barkov [EMAIL PROTECTED] wrote: We are trying to discover this bug now. Caffeinate The World wrote: mnogosearch 3.1.9-pre13, pgsql 7.1-current, netbsd/alpha 1.5.1-current running cachemode. i've been indexing and splitter-ing just fine. 'til today when after an overnight of indexers running and gathering up a log file of over 31 MB, cachelogd automatically started a new log file. i ran 'splitter -p' on that 31 MB log file. it was split up just fine. then i ran 'splitter' and it core dumped almost half way thru. cut ... Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000 Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 2 new: 4 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:27049 new:13718 total:40767 Segmentation fault - core dumped /cut here is the backtrace: cut ... #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 591 table[header.ntables].pos=pos; (gdb) bt #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0xc712f381000470e1 /cut sorry i don't think i compiled splitter with debug flag on so i don't have much more info. here is the filesizes: -rw-r--r-- 1 root wheel 4 Jan 14 10:56 77A.log -rw-r--r-- 1 root wheel 11732 Jan 14 10:56 77B.log -rw-r--r-- 1 root wheel 465360 Jan 14 10:56 77C.log ^^ ^^^ -rw-r--r-- 1 root wheel 73696 Jan 14 10:56 77D.log -rw-r--r-- 1 root wheel 22764 Jan 14 10:56 77E.log notice 77C.log, that's where it core dumped. it's unusually large. i think there is a bug in splitter. how do i continue with the splitter process at this point so that 77C.log and others get processed? _
Re: UdmSearch: php-mnogo
was the problem which caused a segmentation fault fixed? --- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Here is the php4 extension module which adds native libudmsearch functions support for php. We uploaded it at the PHP CVS source tree, so it is expected that this module will be included in the next 4.0.5 release of php. This module currently at the appha state, but it is working and can be used already. Now in contains only basic support of libudmsearch features. But we have working on it and updating it at the PHP CVS source tree. I sending into list module in its current state. Please feel free to report any bugs you will find. Documentation about this module currently is unavaiable, it will be done after some time. All functions it supports are given in the test.php example script. Installation instructions: 1. create ext/mnogosearch directory at the php sources 2. unpack all files from this tarball at the ext/mnogosearch 3. delete configure and main/php_config.h.in scripts from php sources 4. run buildconf script to recreate configure, makefile templates and php_config.h.in files 5. Now you can run configure --with-mnogosearch=dir --with-mysql=dir ... according to your needs. 6. # make; make install 7. Test php-mnogo functions with test.php script located in the tarball -- Regards, Sergey aka gluke. ATTACHMENT part 2 application/x-compressed name=php-mnogo-0.5.tgz __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[4]: UdmSearch: php-mnogo
i took out "-ludmsearch" from LIBS. recompiled: ... gmake[1]: Entering directory `/home/staffs/t/tom/work/php/php4-current/php4' /bin/sh /home/staffs/t/tom/work/php/php4-current/php4/libtool --silent --mode=compile gcc -I. -I /home/staffs/t/tom/work/php/php4-current/php4/ -I/home/staffs/t/tom/work/php/php4-current/php4/ma in -I/home/staffs/t/tom/work/php/php4-current/php4 -I/usr/pkg/include/httpd -I/home/staffs/t/tom/ work/php/php4-current/php4/Zend -I/usr/pkg/include/freetype -I/usr/pkg/include -I/usr/local/inclu de -I/usr/local/include/mysql -I/usr/local/install/Sablot-0.44/include -I/home/staffs/t/tom/work/ php/php4-current/php4/ext/xml/expat/xmltok -I/home/staffs/t/tom/work/php/php4-current/php4/ext/xm l/expat/xmlparse -I/home/staffs/t/tom/work/php/php4-current/php4/TSRM -DNETBSD -DEAPI -DUSE_EXPA T -I/usr/pkg/include -DXML_BYTE_ORDER=12 -g -O2 -c stub.c /bin/sh /home/staffs/t/tom/work/php/php4-current/php4/libtool --silent --mode=link gcc -I. -I/ho me/staffs/t/tom/work/php/php4-current/php4/ -I/home/staffs/t/tom/work/php/php4-current/php4/main -I/home/staffs/t/tom/work/php/php4-current/php4 -I/usr/pkg/include/httpd -I/home/staffs/t/tom/wor k/php/php4-current/php4/Zend -I/usr/pkg/include/freetype -I/usr/pkg/include -I/usr/local/include -I/usr/local/include/mysql -I/usr/local/install/Sablot-0.44/include -I/home/staffs/t/tom/work/php /php4-current/php4/ext/xml/expat/xmltok -I/home/staffs/t/tom/work/php/php4-current/php4/ext/xml/e xpat/xmlparse -I/home/staffs/t/tom/work/php/php4-current/php4/TSRM -DNETBSD -DEAPI -DUSE_EXPAT - I/usr/pkg/include -DXML_BYTE_ORDER=12 -g -O2 -Wl,-export-dynamic -Wl,-R/usr/lib -L/usr/lib -Wl,- R/usr/pkg/lib -L/usr/pkg/lib -Wl,-R/usr/local/lib -L/usr/local/lib -Wl,-R/usr/X11R6/lib -L/usr/X1 1R6/lib -o libphp4.la -rpath /home/staffs/t/tom/work/php/php4-current/php4/libs -avoid-version -L /usr/pkg/lib -L/usr/local/install/pgsql-current/lib -L/usr/local/install/mnogosearch-3.1.9/lib -L /usr/local/lib/mysql -L/usr/local/lib -L/usr/local/install/Sablot-0.44/lib -Wl,-export-dynamic -W l,-R/usr/lib -L/usr/lib -Wl,-R/usr/pkg/lib -L/usr/pkg/lib -Wl,-R/usr/local/lib -L/usr/local/lib - Wl,-R/usr/X11R6/lib -L/usr/X11R6/lib -R /usr/pkg/lib -R /usr/local/install/pgsql-current/lib -R / usr/local/install/mnogosearch-3.1.9/lib -R /usr/local/lib/mysql -R /usr/local/lib -R /usr/local/i nstall/Sablot-0.44/lib stub.lo Zend/libZend.la sapi/apache/libsapi.la main/libmain.la ext/gd/li bgd.la ext/mnogosearch/libmnogosearch.la ext/mysql/libmysql.la ext/pcre/libpcre.la ext/pgsql/libp gsql.la ext/posix/libposix.la ext/sablot/libsablot.la ext/session/libsession.la ext/sockets/libso ckets.la ext/standard/libstandard.la ext/sysvsem/libsysvsem.la ext/sysvshm/libsysvshm.la ext/xml/ libxml.la ext/zlib/libzlib.la TSRM/libtsrm.la -lz -lxmltok -lxmlparse -lsablot -lpq -lmysqlclient -ludmsearch -lpq -lcrypt -lttf -lpng -lz -lgd -lresolv -lm -lcrypt -lgd -lpng -lz -lm -lc -lpng -ljpeg -lz -lttf -lintl -lXpm -lX11 -lresolv -lgcc *** Warning: This library needs some functionality provided by -ludmsearch. *** I have the capability to make that library automatically link in when *** you link to this library. But I can only do this if you have a *** shared version of the library, which you do not appear to have. *** Warning: This library needs some functionality provided by -lgcc. *** I have the capability to make that library automatically link in when *** you link to this library. But I can only do this if you have a *** shared version of the library, which you do not appear to have. *** Warning: This library needs some functionality provided by -ludmsearch. *** I have the capability to make that library automatically link in when *** you link to this library. But I can only do this if you have a *** shared version of the library, which you do not appear to have. *** Warning: This library needs some functionality provided by -lgcc. *** I have the capability to make that library automatically link in when *** you link to this library. But I can only do this if you have a *** shared version of the library, which you do not appear to have. *** The inter-library dependencies that have been dropped here will be *** automatically added whenever a program is linked with this library *** or is declared to -dlopen it. gmake[1]: Leaving directory `/home/staffs/t/tom/work/php/php4-current/php4' Making all in pear gmake[1]: Entering directory `/home/staffs/t/tom/work/php/php4-current/php4/pear' gmake[1]: Leaving directory `/home/staffs/t/tom/work/php/php4-current/php4/pear' # cd .libs # ls libphp4.la libphp4.lai libphp4.so # ls -l total 4178 lrwxr-xr-x 1 root users 13 Feb 2 00:11 libphp4.la - ../libphp4.la -rw-r--r-- 1 root users 1447 Feb 2 00:11 libphp4.lai -rwxr-xr-x 1 root users 4266206 Feb 2 00:11 libphp4.so # nm *.so | grep Udm U UdmAllocAgent U UdmAllocEnv U UdmDBErrorCode U UdmDBErrorMsg U
Re: Re[2]: UdmSearch: php-mnogo
--- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! CTW was the problem which caused a segmentation fault fixed? The problem was the mysql library bundled with php. If you compile php with --with-mysql it uses its own library to access mysql. And if you compile it with --with-mysql=DIR, than it uses native mysqlclient library. If you will use native mysql library than everything should be ok. i just recompiled the Apache 1.3.12 PHP module with PHP from CVS as of today. it was compiled with: export LIBS="-ludmsearch -lgd -lpng -lz -lm -lc -lpng -ljpeg -lz -lttf -lintl -lXpm -lX11" \ export LDFLAGS="-Wl,-export-dynamic -Wl,-R/usr/lib -L/usr/lib -Wl,-R/usr/pkg/lib \ -L/usr/pkg/lib -Wl,-R/usr/local/lib -L/usr/local/lib -Wl,-R/usr/X11R6/lib -L/usr/X11R6/lib" ./configure \ --with-apxs \ --with-sablot=/usr/local/install/Sablot-0.44 \ --with-mnogosearch=/usr/local \ --with-pgsql=/usr/local \ --with-mysql=/usr/local \ --enable-libgcc \ --with-gnu-ld \ --with-zlib \ --with-system-regex \ --with-config-file-path=/usr/local/etc \ --enable-track-vars \ --enable-force-cgi-redirect \ --enable-discard-path \ --enable-memory-limit \ --enable-sysvsem \ --enable-sysvshm \ --enable-sockets \ --with-gd=/usr/pkg \ --with-ttf=/usr/pkg \ --enable-freetype-4bit-antialias-hack mnogosearch is 3.1.9, php-monogo is 0.5 from your email to the list. upon restarting apache, i get: Undefined symbol UdmFreeAgent and apache didn't start up. # nm libphp4.so | grep Udm U UdmAllocAgent U UdmAllocEnv U UdmDBErrorCode U UdmDBErrorMsg U UdmEnvSetDBAddr U UdmEnvSetDBMode U UdmFind U UdmFreeAgent U UdmFreeResult U UdmGetCharset U UdmInit note they are all undefined. strange. # ls /usr/local/lib/libudm* /usr/local/lib/libudmsearch.a /usr/local/lib/libudmsearch.la so these should have been statically linked. # nm /usr/local/lib/libudmsearch.a | grep UdmFreeAgent 01a0 T UdmFreeAgent Please note that was able to compile php-mnogo-0.1 and apache started fine and i was able to run the test.php script before. i used the exact same procedures. i don't know why i'm seeing Undefined symbol for v0.5. Also, v0.1 did segmentation fault on me. NetBSD/Dec-Alpha 1.5 mnogosearch 3.1.9 php-mnogosearch 0.5 __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Re[6]: UdmSearch: php-mnogo
i modified libtool a bit and it compiled and apache didn't complain. i'll try making a sharedlib later. but upon testing it. it's VERY FAST. i'm using cache mode and it's a few folds faster than the CGI version. also my db is pgsql. when i say fast, i mean REALLY REALLY FAST. and this is with a server load of about 4.. average load is about 1 or .90. i'm running serveral indexers now. something that is strange is this: http://search.minnesota.com/test.php search for these words: city council minneapolis official first entry will be: ---cut--- 1. http://www.ci.minneapolis.mn.us/citywork/elected.html CONT : text/html TITLE: Minneapolis Elected Officials KEYWORDS: Minneapolis, City of Minneapolis, Minnesota, Twin Cities, City of Lakes, City Government, MN, MPLS, Municipal Government, Municipality, Local Government, Govern DESC: This is the official web site for the City of Minneapolis, Minnesota, USA. As a round-the-clock ser TEXT: Minneapolis Elected Officials View the City of Minneapolis Goals 1999 Goals 2000 Goals 2001 GoalsMayor Sharon Sayles BeltonCouncil Members About the City Council (roles and responsibilities)Ward 1 - Paul Ostrow Ward 2 - Joan SIZE : 9456 MODIFIED : 979051812 URLID : 517899 SCORE : 4 ---/cut--- look at the line "TEXT:" now if you use "Minneapolis Elected Officials" as your new search words, it will return 0 documents found. why? one thing to note is i had to wipe out my ./var/tree, keep the URLs in the db, expire all of the URLs, and 'indexer -m' to reindex them. this process is VERY slow.. it seems as though it's 20 times slower than when i initially started w/o any URLs in the DB yet, just "server" commands. currently there are about 1/2 million urls in there, about 10,000 has been indexed. --- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Friday, February 02, 2001, 8:17:21 AM, you wrote: CTW i took out "-ludmsearch" from LIBS. recompiled: CTW those functions are still Undefined. for some reason the warnings seem CTW ti indicate that it's looking for a shared libudmsearch.so? ok, we will discuss about this problem. Maybe this is because of you are using -export-dynamic in your ldflags. Anyway, you can try to compile/install libudmsearch as shares library by using --enable-shared configure switch whhile configuring mnogosearch. Try reinstall it as shared library and reconfigure/recompile/reinstall php. -- Regards, Sergey aka gluke. __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: mnogo what does it mean?
it's been bugging me for sometime now.. what does mnogo stand for? or what does it mean? __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: splitter still core dumps on 3.1.9
that was a little premature on my part. it did core dump again at 77C when i tried to split another log file. argh. --- Caffeinate The World [EMAIL PROTECTED] wrote: overnight, the "new splitter" using "u_int32_t" was able to split a log file around 31MB. this is the first time i've seen it able to index the log at 77C. can you verify that linux and such have "u_int32_t"? if it's does, i'll submit my patch. this should fix the problem with the alpha. also the patch should enable NetBSD to compile cleanly cause we don't have native threads yet. i'll do some more tests before i can make this official. --- Caffeinate The World [EMAIL PROTECTED] wrote: just a quick note, i changed all occurances of "size_t" in cache.c into "u_int32_t" and recompiled splitter. it seems as though it doesn't core dump on log files like before. note that i had to do "splitter -p" to get new files in ./splitter and then run "splitter". i've only been able to test this on a small set of logs. related to this, i also changed "size_t" in cachelogd.c to "unsigned int". for some reason if i changed it to "u_int32_t" my server ran at a very hight load.. usually it sits at around 1. but if i ran cachelogd with "u_int32_t" changes, it ran at over 30 for system load. scary. --- Caffeinate The World [EMAIL PROTECTED] wrote: NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like it was not fixed for 3.1.9. I'm using cache mode. # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "alpha--netbsd"... (gdb) run -f 77c -t 77c Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c -t 77c /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 0 new: 6 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old: 0 new:36482 total:36482 Program received signal SIGSEGV, Segmentation fault. 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601 601 table[header.ntables].wrd_id=logw ords[t-1].wrd_id; (gdb) bt #0 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601 #1 0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0x1400013550 This warning occurs if you are debugging a function without any symbols (for example, in a stripped executable). In that case, you may wish to increase the size of the search with the `set heuristic-fence-post' command. Otherwise, you told GDB there was a function where there isn't one, or (more likely) you have encountered a bug in GDB. (gdb) l 596 logwords[count].weight=0; 597 598 for(t=1;tcount+1;t++){ 599 if((logwords[t-1].wrd_id!=logwords[t].wrd _id)|| 600 (logwords[t-1].weight!=logwords[t].wei ght)){ 601 table[header.ntables].wrd_id=logw ords[t-1].wrd_id; 602 table[header.ntables].weight=logw ords[t-1].weight; 603 table[header.ntables].pos=pos; 604 table[header.ntables].len=t*
Re: UdmSearch: splitter still core dumps on 3.1.9
another interesting thing to note is that, from using the old log files created by the "old" cachelogd (size_t instead of unsigned int), if "splitter" core dumped, and i remove the file which caused it, i.e. rm /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 and redo "splitter -f 77c -t 77c" it won't core dump anymore. at this point, i'm thinking that if the old cachelogd wrote records that were wrong because of size_t (sizeof 8 instead of 4 on Alpha), and once that "wrong" record is written to tree, the next batch of splitter process need to load 77C3 to delete some records, that's where the problem occurs. i'm going to restart everything again. this time, i won't use the "old" log files from the cachelogd which had "size_t". i'll just stick to the modified cachelogd (with unsigned int) and splitter with cache.c using "u_int32_t". --- Caffeinate The World [EMAIL PROTECTED] wrote: --- Caffeinate The World [EMAIL PROTECTED] wrote: that was a little premature on my part. it did core dump again at 77C when i tried to split another log file. argh. it should be noted that i used log files from 3.1.9pre13. these log files were processed with cachelogd where i hadn't changed size_t to "unsigned int" yet. in which case it could have written the "wrong" record length or something. the very first batch i processed with "new" indexer (u_int32_t) were created with cachelogd (with size_t changed to unsigned int). that batch went fine. no core. then i processed the old 31 MB log from a cachelogd where it was still using size_t. this 31 mb log file also was written ok too. but when i processed another "older" log file, that's when it core again at 77C. it could be the older log files where cachelogd had size_t are causing problems. --- Caffeinate The World [EMAIL PROTECTED] wrote: overnight, the "new splitter" using "u_int32_t" was able to split a log file around 31MB. this is the first time i've seen it able to index the log at 77C. can you verify that linux and such have "u_int32_t"? if it's does, i'll submit my patch. this should fix the problem with the alpha. also the patch should enable NetBSD to compile cleanly cause we don't have native threads yet. i'll do some more tests before i can make this official. --- Caffeinate The World [EMAIL PROTECTED] wrote: just a quick note, i changed all occurances of "size_t" in cache.c into "u_int32_t" and recompiled splitter. it seems as though it doesn't core dump on log files like before. note that i had to do "splitter -p" to get new files in ./splitter and then run "splitter". i've only been able to test this on a small set of logs. related to this, i also changed "size_t" in cachelogd.c to "unsigned int". for some reason if i changed it to "u_int32_t" my server ran at a very hight load.. usually it sits at around 1. but if i ran cachelogd with "u_int32_t" changes, it ran at over 30 for system load. scary. --- Caffeinate The World [EMAIL PROTECTED] wrote: NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like it was not fixed for 3.1.9. I'm using cache mode. # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "alpha--netbsd"... (gdb) run -f 77c -t 77c Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c -t 77c /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 0 new: 6 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old:
Re: UdmSearch: Webboard: indexer will not index Site
how about showing us what your configuration look like, and how you are running indexer (with what parameters etc) --- Werner Bruns [EMAIL PROTECTED] wrote: Author: Werner Bruns Email: [EMAIL PROTECTED] Message: Hello there, regardless what I'm trying, the indexer is doing nothing. First I modified the indexer.conf (hopefully right), all what it did, it indexed the file "robots.txt" thats it. Second it used the minimal version of the indexer.conf. Inbetween I flushed the DB. Nothing!!! Database statistics: Expired 0 Total 0 I'm using mngosearch-3.1.9 and MySQL 3.22.32 So, what I'm doing wrong??? Reply: http://search.mnogo.ru/board/message.php?id=1192 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
alex or serge, could you look over this patch? i believe this patch should fix this problem described below: ---cut--- # diff -ru indexer.c.orig indexer.c --- indexer.c.orig Tue Jan 30 10:45:03 2001 +++ indexer.c Tue Jan 30 10:47:29 2001 @@ -368,7 +368,7 @@ } /* Find correspondent Server record from indexer.conf */ - if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ + if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr)) (!CurSrv-delete_no_server ))){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) ---/cut--- --- Caffeinate The World [EMAIL PROTECTED] wrote: i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many URL's in my sql db not having associated Server commands. here i just tried to reindex and i see that my URL is being deleted: # indexer -m -s 200 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch- 3.1.9/etc/indexer.conf' jobs Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm Indexer[2397]: [1] No 'Server' command for url... deleted. ò^C Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting more URL's. here is my full indexer.conf: ---cut--- #Include inc1.conf DBAddr pgsql://***:*@/work/ DBMode cache #SyslogFacility local7 LogdAddr localhost:7000 LocalCharset iso-8859-1 Ispellmode db StopwordTable stopword #ServerTable server DeleteNoServer no #Allow * #Disallow NoMatch *.state.mn.us/* Disallow http://www.rootsweb.com/~mn* Disallow http://www.wxusa.com/* Disallow http://www.vitalrec.com/* Disallow http://*yahoo.com/* Disallow http://*aol.com/* Disallow http://www.salescircular.com/* Disallow http://*.wellsfargo.com/* # Disallow any except known extensions and directory index using "regex" match: Disallow NoMatch Regex \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a sp|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.b*.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo #CheckOnly *.rtf *.pdf *.cdf *.ps #CheckOnly *.ai *.eps *.ppt *.hqx #CheckOnly *.cpt *.bms *.oda *.tcl #CheckOnly *.rpm *.m3u *.qt *.mov #CheckOnly *.map *.aif *.sit *.sea # # or check ANY except known text extensions using "regex" match: #Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ #HrefOnly */mail*.html */thread*.html UseRemoteContentType yes AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm *.m AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType image/gif *.gif AddType Regex \.r[0-9][0-9]$ AddType application/unknown *.* #Mime application/msword "text/plain; charset=cp1251" "catdoc $1" #Mime application/x-troff-man text/plain "deroff" #Mime text/x-postscripttext/plain "ps2ascii" P
Re: UdmSearch: Webboard: Crash! Tainted prefix dirs
what in particular crashes? what mode do you use? etc? --- Mario Gray [EMAIL PROTECTED] wrote: Author: Mario Gray Email: [EMAIL PROTECTED] Message: Mnogo 3.1.9 still crashes very often, anyone have this experience as well? Reply: http://search.mnogo.ru/board/message.php?id=1195 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: How to index meta customizedtag=...
--- Chen Zhang [EMAIL PROTECTED] wrote: Author: Chen Zhang Email: [EMAIL PROTECTED] Message: According to the udmsearch documentation, the indexer could grab contents in title, meta description, meta keyword, body , url , url path ... But I have thouthands of files with the keywords in the format as meta specialword=" 'name|chen' 'place|new_york' 'telephone|212_9876374' " How to configure or change the source code to index the keywords 'name|chen' , 'place|new_york' and 'telephone|212_9876374' into the database? meta Description="..." Any suggestions are highly appreciated. Chen Zhang Reply: http://search.mnogo.ru/board/message.php?id=1193 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
oops that didn't work. but i'm pretty sure we need to test for the condition of delete_no_server here. i also tried: /* Find correspondent Server record from indexer.conf */ if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ if(Indexer-Conf-csrv-delete_no_server){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) result=UdmLoadRobots(Indexer); }else{ result=IND_OK; } if(result==IND_OK)result=UdmDeleteUrl(Indexer,Doc-url_id); FreeDoc(Doc); return(result); } } ---/cut--- but that didn't work either. any ideas? --- Caffeinate The World [EMAIL PROTECTED] wrote: alex or serge, could you look over this patch? i believe this patch should fix this problem described below: ---cut--- # diff -ru indexer.c.orig indexer.c --- indexer.c.orig Tue Jan 30 10:45:03 2001 +++ indexer.c Tue Jan 30 10:47:29 2001 @@ -368,7 +368,7 @@ } /* Find correspondent Server record from indexer.conf */ - if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ + if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr)) (!CurSrv-delete_no_server ))){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) ---/cut--- --- Caffeinate The World [EMAIL PROTECTED] wrote: i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many URL's in my sql db not having associated Server commands. here i just tried to reindex and i see that my URL is being deleted: # indexer -m -s 200 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch- 3.1.9/etc/indexer.conf' jobs Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm Indexer[2397]: [1] No 'Server' command for url... deleted. ò^C Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting more URL's. here is my full indexer.conf: ---cut--- #Include inc1.conf DBAddr pgsql://***:*@/work/ DBMode cache #SyslogFacility local7 LogdAddr localhost:7000 LocalCharset iso-8859-1 Ispellmode db StopwordTable stopword #ServerTable server DeleteNoServer no #Allow * #Disallow NoMatch *.state.mn.us/* Disallow http://www.rootsweb.com/~mn* Disallow http://www.wxusa.com/* Disallow http://www.vitalrec.com/* Disallow http://*yahoo.com/* Disallow http://*aol.com/* Disallow http://www.salescircular.com/* Disallow http://*.wellsfargo.com/* # Disallow any except known extensions and directory index using "regex" match: Disallow NoMatch Regex \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a sp|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.b*.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo
UdmSearch: Server order
if indexer follows the order of Server command in the indexer.conf file in order to index subsections before parent sections: Server http://host/depth1/depth2/ Server http://host/ how do you specify such order in ServerTable used in SQL? __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: splitter still core dumps on 3.1.9
NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like it was not fixed for 3.1.9. I'm using cache mode. # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "alpha--netbsd"... (gdb) run -f 77c -t 77c Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c -t 77c /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 0 new: 6 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old: 0 new:36482 total:36482 Program received signal SIGSEGV, Segmentation fault. 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601 601 table[header.ntables].wrd_id=logw ords[t-1].wrd_id; (gdb) bt #0 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601 #1 0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0x1400013550 This warning occurs if you are debugging a function without any symbols (for example, in a stripped executable). In that case, you may wish to increase the size of the search with the `set heuristic-fence-post' command. Otherwise, you told GDB there was a function where there isn't one, or (more likely) you have encountered a bug in GDB. (gdb) l 596 logwords[count].weight=0; 597 598 for(t=1;tcount+1;t++){ 599 if((logwords[t-1].wrd_id!=logwords[t].wrd _id)|| 600 (logwords[t-1].weight!=logwords[t].wei ght)){ 601 table[header.ntables].wrd_id=logw ords[t-1].wrd_id; 602 table[header.ntables].weight=logw ords[t-1].weight; 603 table[header.ntables].pos=pos; 604 table[header.ntables].len=t*sizeo f(UDM_CACHEWORD)-pos; 605 pos+=table[header.ntables].len; (gdb) __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: splitter still core dumps on 3.1.9
just a quick note, i changed all occurances of "size_t" in cache.c into "u_int32_t" and recompiled splitter. it seems as though it doesn't core dump on log files like before. note that i had to do "splitter -p" to get new files in ./splitter and then run "splitter". i've only been able to test this on a small set of logs. related to this, i also changed "size_t" in cachelogd.c to "unsigned int". for some reason if i changed it to "u_int32_t" my server ran at a very hight load.. usually it sits at around 1. but if i ran cachelogd with "u_int32_t" changes, it ran at over 30 for system load. scary. --- Caffeinate The World [EMAIL PROTECTED] wrote: NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like it was not fixed for 3.1.9. I'm using cache mode. # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "alpha--netbsd"... (gdb) run -f 77c -t 77c Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c -t 77c /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 0 new: 6 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old: 0 new:36482 total:36482 Program received signal SIGSEGV, Segmentation fault. 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601 601 table[header.ntables].wrd_id=logw ords[t-1].wrd_id; (gdb) bt #0 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601 #1 0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0x1400013550 This warning occurs if you are debugging a function without any symbols (for example, in a stripped executable). In that case, you may wish to increase the size of the search with the `set heuristic-fence-post' command. Otherwise, you told GDB there was a function where there isn't one, or (more likely) you have encountered a bug in GDB. (gdb) l 596 logwords[count].weight=0; 597 598 for(t=1;tcount+1;t++){ 599 if((logwords[t-1].wrd_id!=logwords[t].wrd _id)|| 600 (logwords[t-1].weight!=logwords[t].wei ght)){ 601 table[header.ntables].wrd_id=logw ords[t-1].wrd_id; 602 table[header.ntables].weight=logw ords[t-1].weight; 603 table[header.ntables].pos=pos; 604 table[header.ntables].len=t*sizeo f(UDM_CACHEWORD)-pos; 605 pos+=table[header.ntables].len; (gdb) __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: more splitter crashes
i've been seeing splitter coredump consistently at this point: # /usr/local/install/mnogosearch-3.1.9/sbin/splitter.old -f 92e -t 92e /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E06000 old: 8 new: 1 total: 9 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E0D000 old: 19 new: 3 total: 22 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E12000 old: 7 new: 1 total: 8 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E36000 old: 72 new: 9 total: 81 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E48000 old: 63 new: 3 total: 66 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E5F000 old: 220 new: 1 total: 221 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E7E000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E97000 old:4044 new: 41 total:4085 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92EA3000 old: 74 new: 1 total: 75 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92EB old: 192 new: 4 total: 196 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92EB1000 old: 5 new: 1 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92ED5000 old: 248 new: 4 total: 252 Segmentation fault - core dumped # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter.old splitter.old.core GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "alpha--netbsd"... Core was generated by `splitter.old'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/libexec/ld.elf_so...done. Reading symbols from /usr/lib/libcrypt.so.0...done. Reading symbols from /usr/local/lib/libpq.so.2...done. Reading symbols from /usr/lib/libc.so.12...done. #0 UdmSplitCacheLog (log=0) at cache.c:546 Source file is more recent than executable. 546 (gdb) bt #0 UdmSplitCacheLog (log=0) at cache.c:546 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0x34d4a3a70fd62 This warning occurs if you are debugging a function without any symbols (for example, in a stripped executable). In that case, you may wish to increase the size of the search with the `set heuristic-fence-post' command. Otherwise, you told GDB there was a function where there isn't one, or (more likely) you have encountered a bug in GDB. (gdb) l 541 int j; 542 543 /*printf("Read old: %s\n",fname);*/ 544 read(oldfd,header,sizeof(header)); 545 read(oldfd,table,header.ntables*sizeof(UD M_CACHETABLE)); 546 547 for(w=0;wheader.ntables;w++){ 548 int c=0; 549 int num=table[w].len/sizeof(UDM_C ACHEWORD); 550 while((r=(num-c))0){ (gdb) __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: 'IspellMode db' and Postgres
---cut--- DROP TABLE "affix"; DROP TABLE "spell"; CREATE TABLE "affix" ( "flag" character varying(1) DEFAULT '' NOT NULL, "type" character varying(1) DEFAULT '' NOT NULL, "lang" character varying(3) DEFAULT '' NOT NULL, "mask" character varying(32) DEFAULT '' NOT NULL, "find" character varying(32) DEFAULT '' NOT NULL, "repl" character varying(32) DEFAULT '' NOT NULL ); CREATE TABLE "spell" ( "word" character varying(64) DEFAULT '' NOT NULL, "flag" character varying(32) DEFAULT '' NOT NULL, "lang" character varying(3) DEFAULT '' NOT NULL ); CREATE INDEX affix_flag ON affix (flag); CREATE INDEX spell_word ON spell (word); ---/cut--- --- Nick Wellnhofer [EMAIL PROTECTED] wrote: Author: Nick Wellnhofer Email: [EMAIL PROTECTED] Message: I tried to use the database ispell support ('IspellMode db') with Postgres, but i couldn't find a create/pgsql/ispell.txt file to create the database tables. I tried to modify the ispell.txt from the mysql directory, but no success. Does anybody know how to get 'IspellMode db' running on Postgres? I think it could give some speed improvement. Nick Reply: http://search.mnogo.ru/board/message.php?id=1169 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: MP3 file causes Segmentation fault(core dumped)
can you provide a backtrace from gdb? w/o it, it would be hard to track down the problem. --- Adrift [EMAIL PROTECTED] wrote: Author: Adrift Email: [EMAIL PROTECTED] Message: whenever I try to index a MP3 file I get a Segmentation fault(core dumped) message and the indexer quits... How do I fix this? Thanks, Ari Reply: http://search.mnogo.ru/board/message.php?id=1157 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Speed and Indexes...
i had the same weird problem on pgsql, using crc-multi mode. i switched to cache mode, now my queries are under a second. --- Matthew Sullivan [EMAIL PROTECTED] wrote: Hi All, Just a few thoughts to throw around - currently I am running the search to a MySQL backend which is a Sun Ultra 10 (Single UltraSpar 440M CPU) with 1 Gig Ram and 18Gig of drive space If I login to the mysql server and connect to the database and perform a search on the word test - using the crc-multi indexed data and the sql command: select * from ndict4 where (word_id='-662733300'); i get: 6844 rows in set (30.97 sec) and searching a 2nd time: 6844 rows in set (10.05 sec) ndict4 contains 2051909 rows if I then search on: 'customer' [select * from ndict6 where (word_id='-175892837');] the result is: 2264 rows in set (7.51 sec) then: 2264 rows in set (3.15 sec) ndict6 contains 1415176 rows TO me this seems an awfully long time to perform searches (especially on 1 word) - the mysql server has been tuned roughly and currently consumes 400M of Physical RAM, and there are 95000ish documents in the database - consuming 933M of disk... Questions: 1/ would it appear that I need to tune the MySQL server further? 2/ are these search times extended or do they seem ok? 3/ is there anyway of speeding the searches up? Using UltraSeek on the same words, the results are gathered and rendered in under 1 second (the logs report the queries takes 350ms) -- Yours Matthew -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- I'm really easy to get along with once you people learn to worship me. begin:vcard n:Sullivan;Matthew tel;cell:+44 (0)780 122 5744 tel;fax:+61 (0)3 9693 7699 tel;home:Ex-Directory tel;work:+61 (0)3 9693 7640 x-mozilla-html:TRUE url:http://people.netscape.com/matthews/ org:TABLE cols=2 width=350 spacing=0 rows=1TRTD width=50img src="http://people.netscape.com/matthews/penguin.gif"/TDTDTABLE width=250 spacing=0 border=0TRTDFONT SIZE=2Senior Technical Support EngineerTRTDFONT SIZE=2iPlanet E-Commerce SolutionsTRTDFONT SIZE=2Australian Technical Support Services/TABLE/TABLE version:2.1 email;internet:[EMAIL PROTECTED] adr;quoted-printable:;;Netscape Communications Australia=0D=0A;Level 1, The Tea House, 28 Clarendon Street;South Melbourne;VIC 3205;Australia x-mozilla-cpt:nemesis.netscape.com;-27760 fn:Matthew Sullivan end:vcard ATTACHMENT part 2 application/x-pkcs7-signature name=smime.p7s __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
FIX: Re: UdmSearch: php udm module (php-udm.0.1.tar.gz) returned no results
--- Caffeinate The World [EMAIL PROTECTED] wrote: mnogosearch 3.1.9pre13 (cache mode) http://www.izhcom.ru/~bar/php-udm.0.1.tar.gz NetBSD/DEC-Alpha 1.5.1 i made this module with php4.0.4. compiled just fine. i change the db access in udmsearch.php and was able to access pgsql fine. but no matter what i change: // Stage 3: perform search $res=Udm_Find($udm,"lake"); "lake" to, the search always returns: Documents 1-0 from 0 total found i'm using words that i can find with the regular search.cgi (C version). is this module compatible with 'cache' mode in 3.1.9pre13? i guess there is a reason why it's v0.1. here is the patch to fix it for any mode besides "single". i use "cache" and it worked with both the php cgi and apache module. --- php_udm.c.orig Sun Jan 21 02:23:57 2001 +++ php_udm.c Sun Jan 21 02:02:38 2001 @@ -178,6 +178,7 @@ Env=UdmAllocEnv(); Agent=UdmAllocAgent(Env,0,0); UdmEnvSetDBAddr(Env,dbaddr); + UdmEnvSetDBMode(Env,dbmode); ZEND_REGISTER_RESOURCE(return_value,Agent,le_link); } break; yahoo will probably mess up on the line breaks, but i hope you get the point. __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: anyone using 3.1.9pre13, CacheMode, and on Alpha?
if you are using 3.1.9pre13, CacheMode, and on Alpha, can you verify that 'splitter -p' creates only 4095 (000-FFE) instead of 4096 files (000-FFF) in ./var/splitter? i can't seem to locate why this is in the function 'UdmPreSplitCacheLog()' from ./src/cache.c. my guess is some kind of type mismatch, ie. size_t (on 64bit Alpha it's 8 instead of 4 for sizeof) or ./include/udm_cache.h definitions structures using time_t. again on the Alpha it's only 4 wide instead of 8 for sizeof. __ Do You Yahoo!? Yahoo! Auctions - Buy the things you want at great prices. http://auctions.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: splitter core dump
further testing shows that it's because size_t is unsigned int on intel, but on the alpha it's unsigned long. i'm on the alpha. it's understandable why we'd overun the array buffer since i'd have on huge number on the alpha. --- Alexander Barkov [EMAIL PROTECTED] wrote: Thanks for debugging! This will help us. Caffeinate The World wrote: it put in a printf to help track down the problem: for(t=1;tcount+1;t++){ /* Debug to test array out of bound */ printf("Count:%4d, headerntables:%4d, Array Index t:%4d\n",count,header.ntables,t); if((logwords[t-1].wrd_id!=logwords[t].wrd_id)|| (logwords[t-1].weight!=logwords[t].weight)){ table[header.ntables].wrd_id=logwords[t-1].wrd_id; table[header.ntables].weight=logwords[t-1].weight; table[header.ntables].pos=pos; table[header.ntables].len=t*sizeof(UDM_CACHEWORD)-pos; pos+=table[header.ntables].len; header.ntables++; } } after running splitter on the file 77C.log, i get: ... Count:35996, headerntables:8328, Array Index t:35571 Count:35996, headerntables:8328, Array Index t:35572 Count:35996, headerntables:8328, Array Index t:35573 Count:35996, headerntables:8329, Array Index t:35574 Segmentation fault - core dumped looks as if the array index is out of bound? --- Alexander Barkov [EMAIL PROTECTED] wrote: We are trying to discover this bug now. Caffeinate The World wrote: mnogosearch 3.1.9-pre13, pgsql 7.1-current, netbsd/alpha 1.5.1-current running cachemode. i've been indexing and splitter-ing just fine. 'til today when after an overnight of indexers running and gathering up a log file of over 31 MB, cachelogd automatically started a new log file. i ran 'splitter -p' on that 31 MB log file. it was split up just fine. then i ran 'splitter' and it core dumped almost half way thru. cut ... Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000 Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 2 new: 4 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:27049 new:13718 total:40767 Segmentation fault - core dumped /cut here is the backtrace: cut ... #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 591 table[header.ntables].pos=pos; (gdb) bt #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0xc712f381000470e1 /cut sorry i don't think i compiled splitter with debug flag on so i don't have much more info. here is the filesizes: -rw-r--r-- 1 root wheel 4 Jan 14 10:56 77A.log -rw-r--r-- 1 root wheel 11732 Jan 14 10:56 77B.log -rw-r--r-- 1 root wheel 465360 Jan 14 10:56 77C.log ^^ ^^^ -rw-r--r-- 1 root wheel 73696 Jan 14 10:56 77D.log -rw-r--r-- 1 root wheel 22764 Jan 14 10:56 77E.log notice 77C.log, that's where it core dumped. it's unusually large. i think there is a bug in splitter. how do i continue with the splitter process at this point so that 77C.log and others get processed? __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: splitter core dump @ 77C
sizeof(int)= 4, sizeof(size_t)= 8 --- Alexander Barkov [EMAIL PROTECTED] wrote: Caffeinate The World wrote: did you look into this alex? Yes, thank for report. We are trying to find the reason of bug. if not, i'll recompile with debug on for splitter and will try to locate the problem myself. i think it has to do with the 64bit platform and wrong expected numbers. What is sizeof(int) on your platform? __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: no FFF tree in cachemode tree structure
yes that's correct i'm on NetBSD/Dec-Alpha 64bit. you guys need to look over the use of size_t. see my other email message. on the Alpha it's unsigned long, on intel it's unsigned int. this would cause the big difference. --- Alexander Barkov [EMAIL PROTECTED] wrote: Caffeinate The World wrote: from cachemode.txt /var/tree/00/0/0 ... /var/tree/00/0/000FF ... ... /var/tree/FF/F/FFF00 ... /var/tree/FF/F/F in 3.1.9pre13, i've never seen splitter break the tree into /var/tree/FF/F/ only /var/tree/FF/E/... is the highest. is that a bug? Probably this is becaues of Tru64? We'll check the code agains platform independance. also the filename is actually 8 hex chars instead of just 5. It is changed in 3.1.9 sources. Files now are 8 characters in length. __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
time_t is int on Alpha (was Re: UdmSearch: no FFF tree in cachemode tree structure)
so i covered the problems with size_t in my other email. while trying to figure out why im missing FFF tree node, i saw in include/udm_cache.h that there are many references to time_t. you should know that on 64bit Alpha, time_t is an int where as on the other platforms, time_t is a long. so with size_t in cache.c and time_t in include/udm_cache.h, i can see why i'm having these problems on my 64bit Alpha. --- Alexander Barkov [EMAIL PROTECTED] wrote: Caffeinate The World wrote: from cachemode.txt /var/tree/00/0/0 ... /var/tree/00/0/000FF ... ... /var/tree/FF/F/FFF00 ... /var/tree/FF/F/F in 3.1.9pre13, i've never seen splitter break the tree into /var/tree/FF/F/ only /var/tree/FF/E/... is the highest. is that a bug? Probably this is becaues of Tru64? We'll check the code agains platform independance. also the filename is actually 8 hex chars instead of just 5. It is changed in 3.1.9 sources. Files now are 8 characters in length. __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
RE: UdmSearch: Webboard: Search never finds any records...
ok i'm going to assume you're very new. i mean no offense by this, but mnogosearch is really confusing to begin with. in the 'etc' dir where you have your indexer.conf file, you should have another file 'search.htm'. i'm going to assume you are using search.cgi instead of the perl or php version. in search.htm, you need to make sure that DBAddr and DBMode match those set in indexer.conf. if you don't have those set correctly you won't get any results back from your search. that was my problem. i didn't set DBMode in search.htm to match indexer.conf. --- John Dispirito [EMAIL PROTECTED] wrote: Could you be more specific? I've used the basic settings in the the stock conf file and it still doesn't work, but when i get a status of the indexer, its indexed like 2200 sites completely, but still no results.. -Original Message- From: Caffeinate The World [mailto:[EMAIL PROTECTED]] Sent: Wednesday, January 17, 2001 7:54 PM To: John Dispirito; [EMAIL PROTECTED] Subject: Re: UdmSearch: Webboard: Search never finds any records... check your settings in search.htm. i had the exact same problem when i first started using mnogosearch. --- John Dispirito [EMAIL PROTECTED] wrote: Author: John Dispirito Email: [EMAIL PROTECTED] Message: I have a problem, I'm running UDMsearch 3.0.23, and it successfully spiders all of my sites (about 150) but whenever I try to search for anything, it never finds any information, no matter how simple the search query... My search.conf file is default except for the changes to the dbaddr line and the crc-multi line. my indexer.conf file is here, I've omitted the urls I'm searching, but they were in the format Server http://www.url.org Any ideas? =-=-=-=-indexer.conf file=-=-=-=-=-=- # # This is indexer.conf sample for 'ftpsearch' mode. # Indexer will index only the URL but no the content # of the documents. # DBHost localhost DBName udmsearch DBUser root # Turn on indexing URL of the documents UrlWeight 1 # Do not process robots.txt. It is usually used on HTTP servers only Robots no URL INFO WENT HERE # Retrieve only directory list, use HEAD for other files. CheckOnly [^/]$ # Exclude Apache and Squid directory lists in different sort order Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$ \?S=A$ \?S=D$ # Exclude ./. and ./.. from directory list Disallow /[.]{1,2} /\%2e /\%2f Reply: http://search.mnogo.ru/board/message.php?id=1141 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: splitter core dump
it put in a printf to help track down the problem: for(t=1;tcount+1;t++){ /* Debug to test array out of bound */ printf("Count:%4d, headerntables:%4d, Array Index t:%4d\n",count,header.ntables,t); if((logwords[t-1].wrd_id!=logwords[t].wrd_id)|| (logwords[t-1].weight!=logwords[t].weight)){ table[header.ntables].wrd_id=logwords[t-1].wrd_id; table[header.ntables].weight=logwords[t-1].weight; table[header.ntables].pos=pos; table[header.ntables].len=t*sizeof(UDM_CACHEWORD)-pos; pos+=table[header.ntables].len; header.ntables++; } } after running splitter on the file 77C.log, i get: ... Count:35996, headerntables:8328, Array Index t:35571 Count:35996, headerntables:8328, Array Index t:35572 Count:35996, headerntables:8328, Array Index t:35573 Count:35996, headerntables:8329, Array Index t:35574 Segmentation fault - core dumped looks as if the array index is out of bound? --- Alexander Barkov [EMAIL PROTECTED] wrote: We are trying to discover this bug now. Caffeinate The World wrote: mnogosearch 3.1.9-pre13, pgsql 7.1-current, netbsd/alpha 1.5.1-current running cachemode. i've been indexing and splitter-ing just fine. 'til today when after an overnight of indexers running and gathering up a log file of over 31 MB, cachelogd automatically started a new log file. i ran 'splitter -p' on that 31 MB log file. it was split up just fine. then i ran 'splitter' and it core dumped almost half way thru. cut ... Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000 Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 2 new: 4 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:27049 new:13718 total:40767 Segmentation fault - core dumped /cut here is the backtrace: cut ... #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 591 table[header.ntables].pos=pos; (gdb) bt #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0xc712f381000470e1 /cut sorry i don't think i compiled splitter with debug flag on so i don't have much more info. here is the filesizes: -rw-r--r-- 1 root wheel 4 Jan 14 10:56 77A.log -rw-r--r-- 1 root wheel 11732 Jan 14 10:56 77B.log -rw-r--r-- 1 root wheel 465360 Jan 14 10:56 77C.log ^^ ^^^ -rw-r--r-- 1 root wheel 73696 Jan 14 10:56 77D.log -rw-r--r-- 1 root wheel 22764 Jan 14 10:56 77E.log notice 77C.log, that's where it core dumped. it's unusually large. i think there is a bug in splitter. how do i continue with the splitter process at this point so that 77C.log and others get processed? __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Search never finds any records...
check your settings in search.htm. i had the exact same problem when i first started using mnogosearch. --- John Dispirito [EMAIL PROTECTED] wrote: Author: John Dispirito Email: [EMAIL PROTECTED] Message: I have a problem, I'm running UDMsearch 3.0.23, and it successfully spiders all of my sites (about 150) but whenever I try to search for anything, it never finds any information, no matter how simple the search query... My search.conf file is default except for the changes to the dbaddr line and the crc-multi line. my indexer.conf file is here, I've omitted the urls I'm searching, but they were in the format Server http://www.url.org Any ideas? =-=-=-=-indexer.conf file=-=-=-=-=-=- # # This is indexer.conf sample for 'ftpsearch' mode. # Indexer will index only the URL but no the content # of the documents. # DBHost localhost DBName udmsearch DBUser root # Turn on indexing URL of the documents UrlWeight 1 # Do not process robots.txt. It is usually used on HTTP servers only Robots no URL INFO WENT HERE # Retrieve only directory list, use HEAD for other files. CheckOnly [^/]$ # Exclude Apache and Squid directory lists in different sort order Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$ \?S=A$ \?S=D$ # Exclude ./. and ./.. from directory list Disallow /[.]{1,2} /\%2e /\%2f Reply: http://search.mnogo.ru/board/message.php?id=1141 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: how do i index mall of america (it uses servlet)
this site uses some weird servlet and i can't index it. error i get from indexer is: no content-type in ... http://www.mallofamerica.com/ http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html i didn't disallow the '?' in indexer.conf and i've added 'servlet' to: Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.asp$|servlet|\.txt$ __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: how do i index mall of america (it uses servlet)
here is more info: Indexer[9843]: [1] http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=main; rs=0file=General/general.html Indexer[9843]: [1] Realm string 'http://*' Indexer[9843]: [1] Allow by default Indexer[9843]: [1] HTTP/1.1 200 ok Indexer[9843]: [1] Server: Microsoft-IIS/4.0 Indexer[9843]: [1] Date: Thu, 18 Jan 2001 04:23:27 GMT Indexer[9843]: [1] content-type:text/html Indexer[9843]: [1] Set-Cookie:LangID=0;Expires=Sat, 18-Jan-2003 04:23:27 GMT;Path=/ Indexer[9843]: [1] Cache-Control:no-cache="set-cookie,set-cookie2" Indexer[9843]: [1] Expires:Thu, 01 Dec 1994 16:00:00 GMT Indexer[9843]: [1] HTTP/1.1 200 ok ? 15968 Indexer[9843]: [1] No Content-type in 'http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369p n=STATICframe=mainrs=0file=General/general.html'! as you can see it shows: Indexer[9843]: [1] content-type:text/html but yet complains that it didn't have Content-type. i know mnogo comparison is not case-sensitive. so why the error? is it because of the lack of space after the colon? --- Caffeinate The World [EMAIL PROTECTED] wrote: this site uses some weird servlet and i can't index it. error i get from indexer is: no content-type in ... http://www.mallofamerica.com/ http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html i didn't disallow the '?' in indexer.conf and i've added 'servlet' to: Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.asp$|servlet|\.txt$ __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
patch for indexer to fix Content-Type [was Re: UdmSearch: how do i index mall of america (it uses servlet)]
here is a patch for 3.1.9-pre13 to fix the cases where some web servers don't follow specs and not have the space between the ':' and the kind of Content-Type. ie. it uses Content-Type:text/html ^^ instead of Content-Type: text/html --- indexer.c.orig Thu Jan 18 02:06:08 2001 +++ indexer.c Thu Jan 18 01:44:07 2001 @@ -802,7 +802,8 @@ !UDM_STRNCASECMP(sname,"IIS")) Indexer-charset=UDM_CHARSET_CP1251; }else - if(!UDM_STRNCASECMP(tok,"Content-Type: ")){ + if(!UDM_STRNCASECMP(tok,"Content-Type: ")|| + !UDM_STRNCASECMP(tok,"Content-Type:")){ if (!Indexer-Conf-use_remote_cont_type) { content_type=UdmContentType(Indexer-Conf,Doc-url); } --- Caffeinate The World [EMAIL PROTECTED] wrote: here is more info: Indexer[9843]: [1] http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=main; rs=0file=General/general.html Indexer[9843]: [1] Realm string 'http://*' Indexer[9843]: [1] Allow by default Indexer[9843]: [1] HTTP/1.1 200 ok Indexer[9843]: [1] Server: Microsoft-IIS/4.0 Indexer[9843]: [1] Date: Thu, 18 Jan 2001 04:23:27 GMT Indexer[9843]: [1] content-type:text/html Indexer[9843]: [1] Set-Cookie:LangID=0;Expires=Sat, 18-Jan-2003 04:23:27 GMT;Path=/ Indexer[9843]: [1] Cache-Control:no-cache="set-cookie,set-cookie2" Indexer[9843]: [1] Expires:Thu, 01 Dec 1994 16:00:00 GMT Indexer[9843]: [1] HTTP/1.1 200 ok ? 15968 Indexer[9843]: [1] No Content-type in 'http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369p n=STATICframe=mainrs=0file=General/general.html'! as you can see it shows: Indexer[9843]: [1] content-type:text/html but yet complains that it didn't have Content-type. i know mnogo comparison is not case-sensitive. so why the error? is it because of the lack of space after the colon? --- Caffeinate The World [EMAIL PROTECTED] wrote: this site uses some weird servlet and i can't index it. error i get from indexer is: no content-type in ... http://www.mallofamerica.com/ http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html i didn't disallow the '?' in indexer.conf and i've added 'servlet' to: Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.asp$|servlet|\.txt$ __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: incorrect Follow behavior
using v3.1.9pre13. i have in short: DeleteNoServer no Follow path Realm * with no Server variable set for this URL. why does indexer -i -u http://www.gorp.com/gorp/location/mn/mn.htm add other paths from www.gorp.com? let me illustrate: # indexer -C -u http://www.gorp.com/% You are going to delete database 'mnwork' content Are you sure?(YES/no)YES Deleting...Done # indexer -i -u http://www.gorp.com/gorp/location/mn/mn.htm Indexer[7327]: indexer from mnogosearch-3.1.9.pre13/PgSQL started with '/usr/local/install/mnogos earch-3.1.9/etc/indexer.conf' Indexer[7327]: [1] http://www.gorp.com/gorp/location/mn/mn.htm Indexer[7327]: [1] Done (9 seconds) ---/cut--- mnwork=# select url from url where url like 'http://www.gorp.com/%'; url --- http://www.gorp.com/ http://www.gorp.com/default.htm http://www.gorp.com/gorp/about.htm http://www.gorp.com/gorp/activity/byway/MN.htm http://www.gorp.com/gorp/activity/main.htm http://www.gorp.com/gorp/activity/paddling/wsr_mwgl.htm http://www.gorp.com/gorp/books/main.htm http://www.gorp.com/gorp/eclectic/family/minn_family.htm http://www.gorp.com/gorp/freelance/ http://www.gorp.com/gorp/gear/main.htm http://www.gorp.com/gorp/guide.htm http://www.gorp.com/gorp/interact/default.htm http://www.gorp.com/gorp/jobs/ http://www.gorp.com/gorp/jobs/gorpjobs.htm http://www.gorp.com/gorp/location/MN/MN.htm http://www.gorp.com/gorp/location/MN/MN_e.htm http://www.gorp.com/gorp/location/MN/MN_feats.htm http://www.gorp.com/gorp/location/MN/MN_links.htm http://www.gorp.com/gorp/location/MN/MN_maps.htm http://www.gorp.com/gorp/location/MN/MN_ne.htm http://www.gorp.com/gorp/location/MN/MN_nw.htm http://www.gorp.com/gorp/location/MN/MN_resource.htm http://www.gorp.com/gorp/location/MN/MN_se.htm http://www.gorp.com/gorp/location/MN/MN_sw.htm http://www.gorp.com/gorp/location/MN/MN_w.htm http://www.gorp.com/gorp/location/cities/main.htm http://www.gorp.com/gorp/location/cities/minneapolis.htm http://www.gorp.com/gorp/location/main.htm http://www.gorp.com/gorp/location/mn/ http://www.gorp.com/gorp/location/mn/mn.htm http://www.gorp.com/gorp/location/mn/we_twincities.htm http://www.gorp.com/gorp/location/mn/xc_gun.htm http://www.gorp.com/gorp/location/us/us.htm more URLs if 'Follow path' is set, shouldn't it default to that? shouldn't it ONLY add URLs like: http://www.gorp.com/gorp/location/mn/* which would fall in the same path? __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: no FFF tree in cachemode tree structure
from cachemode.txt /var/tree/00/0/0 ... /var/tree/00/0/000FF ... ... /var/tree/FF/F/FFF00 ... /var/tree/FF/F/F in 3.1.9pre13, i've never seen splitter break the tree into /var/tree/FF/F/ only /var/tree/FF/E/... is the highest. is that a bug? also the filename is actually 8 hex chars instead of just 5. __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: splitter core dump
--- Alexander Barkov [EMAIL PROTECTED] wrote: We are trying to discover this bug now. i've not had 8 files that made splitter core dump. all of them at 'var/tree/77/C/77C3' i hope that was more detail for you. there is a pattern there. Caffeinate The World wrote: mnogosearch 3.1.9-pre13, pgsql 7.1-current, netbsd/alpha 1.5.1-current running cachemode. i've been indexing and splitter-ing just fine. 'til today when after an overnight of indexers running and gathering up a log file of over 31 MB, cachelogd automatically started a new log file. i ran 'splitter -p' on that 31 MB log file. it was split up just fine. then i ran 'splitter' and it core dumped almost half way thru. cut ... Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000 Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 2 new: 4 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:27049 new:13718 total:40767 Segmentation fault - core dumped /cut here is the backtrace: cut ... #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 591 table[header.ntables].pos=pos; (gdb) bt #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0xc712f381000470e1 /cut sorry i don't think i compiled splitter with debug flag on so i don't have much more info. here is the filesizes: -rw-r--r-- 1 root wheel 4 Jan 14 10:56 77A.log -rw-r--r-- 1 root wheel 11732 Jan 14 10:56 77B.log -rw-r--r-- 1 root wheel 465360 Jan 14 10:56 77C.log ^^ ^^^ -rw-r--r-- 1 root wheel 73696 Jan 14 10:56 77D.log -rw-r--r-- 1 root wheel 22764 Jan 14 10:56 77E.log notice 77C.log, that's where it core dumped. it's unusually large. i think there is a bug in splitter. how do i continue with the splitter process at this point so that 77C.log and others get processed? __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: getting URLs from sub tree of dmoz without indexing dmoz
using 3.1.9pre13: i've been able to figure out most situations and how to index sites. now i'm stuck at trying to get the site URLs listed in dmoz, and then only index those urls found. DBAddr pgsql://user:pass@/mydb/ DBMode cache LogdAddr localhost:7000 Ispellmode db StopwordTable stopword DeleteNoServer no HrefOnly Match String *dmoz* Disallow String http://www.dmoz.org/* #Allow * Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.txt$ Disallow */cgi-bin/* *.cgi */nph-* Disallow Regex \? Index yes Follow path Server world http://www.dmoz.org/Regional/North_America/United_States/California/ Realm http://* ---/cut--- that will not work. it will insert the server url above into the db and that's it. won't even traverse the subtree and grab all the site urls. i don't want to index anything at dmoz, just get the urls listed for each sub categories, then index those sites. __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: getting URLs from sub tree of dmoz without indexing dmoz
--- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! Tuesday, January 16, 2001, 8:19:06 AM, you wrote: CTW HrefOnly Match String *dmoz* CTW Disallow String http://www.dmoz.org/* CTW that will not work. it will insert the server url CTW above into the db and that's it. won't even traverse CTW the subtree and grab all the site urls. It will not work because you disallowed everything under http://www.dmoz.org/ that was just one combination i tried. here is another that will not work under 3.1.9pre13: DBAddr pgsql://user:pass@/mnwork/ DBMode cache LogdAddr localhost:7000 Ispellmode db StopwordTable stopword DeleteNoServer no HrefOnly Match Regex .*dmoz.*Minnesota.* Allow http://www.dmoz.org/Regional/North_America/United_States/Minnesota/* Disallow http://www.dmoz.org/* #Allow * Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.txt$ Disallow */cgi-bin/* *.cgi */nph-* Disallow Regex \? Index yes Follow path Server world http://www.dmoz.org/Regional/North_America/United_States/Minnesota/Weather Realm http://* ---\cut--- # indexer -u %dmoz% Indexer[4800]: indexer from mnogosearch-3.1.9.pre13/PgSQL started with '/usr/local/install/mnogos earch-3.1.9/etc/indexer.conf' Tue 16 00:49:24 [29262] Client #0 connected Indexer[4800]: [1] http://www.dmoz.org/Regional/North_America/United_States/Minnesota/Weather Indexer[4800]: [1] http://www.dmoz.org/robots.txt Indexer[4800]: [1] Done (19 seconds) Tue 16 00:49:43 [29262] Client #0 left ---\cut--- why didn't it pickup the URLs of the site listed under the Weather categorie? __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: splitter core dump
mnogosearch 3.1.9-pre13, pgsql 7.1-current, netbsd/alpha 1.5.1-current running cachemode. i've been indexing and splitter-ing just fine. 'til today when after an overnight of indexers running and gathering up a log file of over 31 MB, cachelogd automatically started a new log file. i ran 'splitter -p' on that 31 MB log file. it was split up just fine. then i ran 'splitter' and it core dumped almost half way thru. cut ... Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000 Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 2 new: 4 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:27049 new:13718 total:40767 Segmentation fault - core dumped /cut here is the backtrace: cut ... #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 591 table[header.ntables].pos=pos; (gdb) bt #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0xc712f381000470e1 /cut sorry i don't think i compiled splitter with debug flag on so i don't have much more info. here is the filesizes: -rw-r--r-- 1 root wheel 4 Jan 14 10:56 77A.log -rw-r--r-- 1 root wheel 11732 Jan 14 10:56 77B.log -rw-r--r-- 1 root wheel 465360 Jan 14 10:56 77C.log ^^ ^^^ -rw-r--r-- 1 root wheel 73696 Jan 14 10:56 77D.log -rw-r--r-- 1 root wheel 22764 Jan 14 10:56 77E.log notice 77C.log, that's where it core dumped. it's unusually large. i think there is a bug in splitter. how do i continue with the splitter process at this point so that 77C.log and others get processed? __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: splitter core dump
--- Caffeinate The World [EMAIL PROTECTED] wrote: mnogosearch 3.1.9-pre13, pgsql 7.1-current, netbsd/alpha 1.5.1-current running cachemode. i've been indexing and splitter-ing just fine. 'til today when after an overnight of indexers running and gathering up a log file of over 31 MB, cachelogd automatically started a new log file. i ran 'splitter -p' on that 31 MB log file. it was split up just fine. then i ran 'splitter' and it core dumped almost half way thru. cut ... Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000 Delete from cache-file /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 2 new: 4 total: 6 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 0 new: 2 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 0 new: 1 total: 1 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 1 new: 1 total: 2 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:27049 new:13718 total:40767 Segmentation fault - core dumped /cut here is the backtrace: cut ... #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 591 table[header.ntables].pos=pos; (gdb) bt #0 0x120018c44 in UdmSplitCacheLog (log= Cannot access memory at address 0x121f873bc. ) at cache.c:591 warning: Hit heuristic-fence-post without finding warning: enclosing function for address 0xc712f381000470e1 /cut sorry i don't think i compiled splitter with debug flag on so i don't have much more info. here is the filesizes: -rw-r--r-- 1 root wheel 4 Jan 14 10:56 77A.log -rw-r--r-- 1 root wheel 11732 Jan 14 10:56 77B.log -rw-r--r-- 1 root wheel 465360 Jan 14 10:56 77C.log ^^ ^^^ -rw-r--r-- 1 root wheel 73696 Jan 14 10:56 77D.log -rw-r--r-- 1 root wheel 22764 Jan 14 10:56 77E.log notice 77C.log, that's where it core dumped. it's unusually large. i think there is a bug in splitter. how do i continue with the splitter process at this point so that 77C.log and others get processed? what i ended up doing here was moved 77C.log to a backup location and ran "splitter" again. it continued and processed all the rest of the file just fine. now, how do i get 77C.log processed? also, should i have backed up the 'del.log' file too? i noticed when splitter did it's work, there were files deleted as well. __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: search.cgi cachemode and php
a while back someone mentioned piping results from search.cgi to a php script to display results and provide for interactions of php form to search.cgi. i can't seem to find this in the mailing list or web board using search. i have no idea when the php people will add in the udm module so i can use php to display and do my queries. the problem is our site is template based, changed by visitor's preferences, so it would help to have this function. __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: Re: http://www.dma.state.mn.us/
it stalls on 3.1.8 with default time outs. i just installed 3.1.9pre and will try that out. on 3.1.8 by reducing the timeouts, indexer will not stall. --- Sergey Kartashoff [EMAIL PROTECTED] wrote: Hi! http://www.pca.state.mn.us/water/basins/mnriver/ Indexer[22838]: [1] http://www.dma.state.mn.us/ The arrow shows where it hangs. Here is what 'ps' I tried this with mnogoSearch-3.1.9.pre12. It correctly says that it cannot connect to this site port 80. -- Regards, Sergey aka gluke. __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: 3.1.9pre13 splitter -p doesn't remove log files
the manual says: B. Preparing cachelogd logs for creating word indexes: Run splitter with "-p" command line argument: /usr/local/mnogosearch/sbin/splitter -p This operation takes all available logs in /var/raw/ directory, devides logs into 4096 parts (one file for each low level word index directory) and store data acceptable by splitter in /var/splitter/ directory. All processed logs in /raw/raw/ directory are removed automatically after this operation. i ran 'splitter -p' and the 'splitter' but the logs are still there. bug? # ls -la /data/mn*/var/raw total 1154 drwxr-xr-x 2 root wheel 512 Jan 12 16:05 . drwxr-xr-x 6 root wheel 512 Jan 12 15:16 .. -rw-r--r-- 1 root wheel3336 Jan 12 16:05 979337154.del -rw-r--r-- 1 root wheel 552836 Jan 12 16:05 979337154.wrd -rw-r--r-- 1 root wheel3648 Jan 12 16:45 del.log -rw-r--r-- 1 root wheel 591672 Jan 12 16:45 wrd.log i know the files in ./splitter/* have to be removed manually though. __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: Indexer dies at seemingly random intervals
--- Alexander Barkov [EMAIL PROTECTED] wrote: Does indexer hung on local web space or remote servers which are far from indexer's machine? indexing all from remote machines. see this message: http://search.mnogo.ru/board/message.php?id=1024 in summary, i think there are bugs in checking for network timeouts or timing out on docs that can't be retrieved. mocha wrote: Author: mocha Email: [EMAIL PROTECTED] Message: i\'ve seen this too. i would start like 10 indexer processes and let them run overnight. then when i wake up in the morning, i see most are in a stale idle state. the associated postgres process is also in idle. i\'m not sure if it\'s relevant, but it started happening after i reached about 20,000 web pages indexed. here you can see that 3 of the indexers are just sitting idle: 22838 p1 I0:33.82 indexer 18309 p3 Is 0:00.05 -sh 18312 p3 S0:00.21 sh 22945 p3 I0:11.62 indexer -l 22948 p3 I0:14.57 indexer -l 22951 p3 S0:33.17 indexer -l 22953 p3 S0:24.93 indexer -l just last night after indexing for a long time, in the morning i found all 10 indexers stalling. i sent a \'kill -HUP\' to the idling indexer processes, and executed indexer again. then they started indexing. it\'s been a few hours and a few are starting to go stale or idle again. i\'m on NetBSD/Alpha 1.5.1_ALPHA. again, i didn\'t see this behavior \'til around 20,000 URLs indexed. # indexer -S UdmSearch statistics StatusExpired Total - 0 8835 9921 Not indexed yet 200 0 23068 OK 300 0 1 Multiple Choices 301 0 13 Moved Permanently 302 0 32 Moved Temporarily 401 0 1 Unauthorized 403 0 15 Forbidden 404 0215 Not found 500 0 1 Internal Server Error 503 0 6 Service Unavailable - Total 8835 33273 i just checked before sending this message, and ALL four indexers are stale again: 22838 p1 I0:33.82 indexer 18309 p3 Is 0:00.05 -sh 18312 p3 S0:00.21 sh 22945 p3 I0:11.62 indexer -l 22948 p3 I0:14.57 indexer -l 22951 p3 I0:36.71 indexer -l 22953 p3 I0:25.60 indexer -l __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: neil@integrals.co.nz
they could have kept with one search engine and pool their efforts together. --- Anonymous [EMAIL PROTECTED] wrote: Author: Neil Fincham Email: Message: Has anyone seen ASPseek? It\'s avalible at http://www.sw.com.sg/products/aspseek/ it look\'s very simular to mnogosearch. Actualy it is mnogosearch the thanks file reads:- \"We would like to thank developers of UdmSearch (now known as MnogoSearch) search engine and especially Alexander Barkov who started that project for the ideas and source code which we used in ASPSeek.\" It is under the GPL license and they have a few advancement\'s that are quite good (phrase search being one of them). Perhaps we should port a few of the good bits back :-). Reply: http://search.mnogo.ru/board/message.php?id=1044 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: anyone using (super fast) cache mode indexing?
i'm currently using pgsql and mnogosearch 3.1.8 and searching takes forever -- especially when searching for multiple words. i looked into the new cache indexing mode and it seems fantastically fast. you can try it out at: http://udm.aspseek.com/cgi-bin/search.cgi i would like to know if anyone is using it in production? how do you like it? what about the manual maintenance steps you have to take? is there anyway to turn the current data that's already indexed in sql into cache mode, or is reindexing the only way to do it? i'm just amazed how fast the results are. __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: neil@integrals.co.nz
yes you did. you dl-ed it and compiled it. then ran it. and even searched it. well your efforts aren't in vain though. you saved me some troubles ;-) --- Neil [EMAIL PROTECTED] wrote: they could have kept with one search engine and pool their efforts together. I agree, I have just compiled it, it is radically different in a lot of way's. quite easy to crash thou, all I have to do is search for more than one word at the same time :-) (perhaps I'm doing something wrong). Neil __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Webboard: indexing urls in the server table
--- "L.T. Harris" [EMAIL PROTECTED] wrote: Author: L.T. Harris Email: [EMAIL PROTECTED] Message: Thanks, but I\'m more confused than ever now. What is the perpose of the sever table that is created by the create file server.txt? well if you can get the data into that db table you can use the command: ServerTable your_table_name_1 your_table_name_2 in your indexer.conf file to get that information. but back to your original question. if you are going to user the server table then you'll need to specify the follow command as i stated in the last message. set 'follow' to: page - if you want to index only that particular page site - index the whole site I have quite a large number of URL I don\'t want to have to put each one in the indexer.conf file (That is what you\'re saying I have to do or am just I being dumb). Thanks! Reply: http://search.mnogo.ru/board/message.php?id=1022 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
Re: UdmSearch: Indexer hang after 1 to 2 hours
i'm having the same problem. i emailed the list and posted on the message board but still no response. i've read about others having the same problem. so you aren't alone here. --- Ernesto Vargas [EMAIL PROTECTED] wrote: I am having problems running the indexer for more that 1 or 2 hours. It just hang with any error. I was running 5 indexer at the same time but it happend the same with 1 indexer at the time. Any suggestions on improving indexer response time? _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED] __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: indexer hangs idle on some webpages
i'm indexing my state government's web sites. however, there are some sites that indexer just stalls on. See the last line below: ... Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/artcl-30.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/artcl-31.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/artcl-32.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/artcl-33.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/artcl-34.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-a.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-b.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-c.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/appd-d-i.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-j.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-k.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-l.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-m.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-n.htm Indexer[22838]: [1] http://www.doer.state.mn.us/lr-mlea/apndx-o.htm Indexer[22838]: [1] http://www.pca.state.mn.us/water/basins/mnriver/plancomment.html Indexer[22838]: [1] http://www.pca.state.mn.us/water/basins/mnriver/mgmt-fw.html Indexer[22838]: [1] http://www.pca.state.mn.us/water/basins/mnriver/mnorgs.html Indexer[22838]: [1] http://www.pca.state.mn.us/water/basins/mnriver/watersheds.html Indexer[22838]: [1] http://www.pca.state.mn.us/water/basins/mnriver/publications.html Indexer[22838]: [1] http://www.pca.state.mn.us/water/basins/mnriver/ Indexer[22838]: [1] http://www.dma.state.mn.us/ The arrow shows where it hangs. Here is what 'ps' shows: 22838 p1 I0:33.82 indexer it's been idling for over 30 minutes. which in turn cause the associated pgsql process to idle too. what could cause this? __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]
UdmSearch: multiple simultaneous indexers
Am I understanding this correctly? If I don't compile with pthreads, I can't run multiple indexers at once? Right now on NetBSD/Alpha, we don't have native threads, would it be possible to change mnogosearch to support other userland thread package like gnu-pth, PTL2, mit-pthreads? __ Do You Yahoo!? Yahoo! Photos - Share your holiday photos online! http://photos.yahoo.com/ __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]