Re: UdmSearch: DeleteNoServer still broken in 3.1.9

Alexander Barkov Tue, 30 Jan 2001 22:45:16 -0800
That's strange. I've tested your indexer.conf. Everything works fine.
indexer does not delete this URL.



Caffeinate The World wrote:
> 
> i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many
> URL's in my sql db not having associated Server commands. here i just tried to
> reindex and i see that my URL is being deleted:
> 
> # indexer -m -s 200
> Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with
> '/usr/local/install/mnogosearch-
> 3.1.9/etc/indexer.conf'
> jobs
> Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm
> Indexer[2397]: [1] No 'Server' command for url... deleted.
> ò^C
> Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting
> more URL's.
> 
> here is my full indexer.conf:
> 
> ---cut---
> #Include inc1.conf
> 
> DBAddr          pgsql://***:*****@/work/
> DBMode cache
> #SyslogFacility local7
> LogdAddr localhost:7000
> LocalCharset iso-8859-1
> Ispellmode db
> StopwordTable stopword
> 
> #ServerTable server
> 
> DeleteNoServer no
> 
> #Allow *
> 
> #Disallow NoMatch *.state.mn.us/*
> Disallow http://www.rootsweb.com/~mn*
> Disallow http://www.wxusa.com/*
> Disallow http://www.vitalrec.com/*
> Disallow http://*yahoo.com/*
> Disallow http://*aol.com/*
> Disallow http://www.salescircular.com/*
> Disallow http://*.wellsfargo.com/*
> # Disallow any except known extensions and directory index using "regex" match:
> Disallow NoMatch Regex
> \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a
> sp|\.txt$
> # Exclude cgi-bin and non-parsed-headers using "string" match:
> Disallow */cgi-bin/* *.cgi */nph-*
> # Exclude anything with '?' sign in URL. Note that '?' sign has a
> # special meaning in "string" match, so we have to use "regex" match here:
> #Disallow Regex  \?
> 
> # Exclude some known extensions using fast "String" match:
> Disallow *.b    *.sh   *.md5  *.rpm
> Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z     *.bz2
> Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
> Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm *.pcx
> Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
> Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
> Disallow *.vrml *.wrl  *.png
> Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
> Disallow *.tex  *.texi *.xls  *.doc  *.texinfo
> Disallow *.rtf  *.pdf  *.cdf  *.ps
> Disallow *.ai   *.eps  *.ppt  *.hqx
> Disallow *.cpt  *.bms  *.oda  *.tcl
> Disallow *.o    *.a    *.la   *.so
> Disallow *.pat  *.pm   *.m4   *.am   *.css
> Disallow *.map  *.aif  *.sit  *.sea
> Disallow *.m3u  *.qt   *.mov
> 
> # Exclude Apache directory list in different sort order using "string" match:
> Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
> 
> # More complicated case. RAR .r00-.r99, ARJ a00-a99 files
> # and unix shared libraries. We use "Regex" match type here:
> Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
> 
> #CheckOnly *.b    *.sh   *.md5
> #CheckOnly *.arj  *.tar  *.zip  *.tgz  *.gz
> #CheckOnly *.lha  *.lzh  *.rar  *.zoo  *.tar*.Z
> #CheckOnly *.gif  *.jpg  *.jpeg *.bmp  *.tiff
> #CheckOnly *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie
> #CheckOnly *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff
> #CheckOnly *.vrml *.wrl  *.png
> #CheckOnly *.exe  *.cab  *.dll  *.bin  *.class
> #CheckOnly *.tex  *.texi *.xls  *.doc  *.texinfo
> #CheckOnly *.rtf  *.pdf  *.cdf  *.ps
> #CheckOnly *.ai   *.eps  *.ppt  *.hqx
> #CheckOnly *.cpt  *.bms  *.oda  *.tcl
> #CheckOnly *.rpm  *.m3u  *.qt   *.mov
> #CheckOnly *.map  *.aif  *.sit  *.sea
> #
> # or check ANY except known text extensions using "regex" match:
> #Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
> 
> #HrefOnly */mail*.html */thread*.html
> 
> UseRemoteContentType yes
> 
> AddType text/plain      *.txt  *.pl *.js *.h *.c *.pm *.e
> AddType text/html       *.html *.htm *.m
> AddType image/x-xpixmap *.xpm
> AddType image/x-xbitmap *.xbm
> AddType image/gif       *.gif
> AddType Regex \.r[0-9][0-9]$
> AddType application/unknown *.*
> 
> #Mime application/msword       "text/plain; charset=cp1251"   "catdoc $1"
> #Mime application/x-troff-man  text/plain                     "deroff"
> #Mime text/x-postscript        text/plain                     "ps2ascii"
> 
> Period 6m
> #Tag <string>
> #Category FFAABBCCDD
> MaxHops 56
> MaxNetErrors 6
> ReadTimeOut 30s
> DocTimeOut 1m30s
> NetErrorDelayTime 1d
> Robots yes
> Clones yes
> BodyWeight 2
> TitleWeight 4
> KeywordWeight 8
> DescWeight 16
> #UrlWeight 16
> #UrlHostWeight 8
> #Category FFAABBCCDD
> MaxHops 56
> MaxNetErrors 6
> ReadTimeOut 30s
> DocTimeOut 1m30s
> NetErrorDelayTime 1d
> Robots yes
> Clones yes
> BodyWeight 2
> TitleWeight 4
> KeywordWeight 8
> DescWeight 16
> #UrlWeight 16
> #UrlHostWeight 8
> #UrlPathWeight 8
> #UrlFileWeight 0
> #IspellCorrectFactor    1
> #IspellIncorrectFactor  1
> #NumberFactor 1
> #AlnumFactor  1
> #MinWordLength 1
> #MaxWordLength 32
> #DeleteBad no
> Index yes
> Follow path
> Server site http://www.state.mn.us/
> Server site http://www.exploreminnesota.com/
> Server site http://www.tpt.org/
> Server page http://www.gorp.com/gorp/location/mn/mn.htm
> Server path http://lists.rootsweb.com/index/usa/MN/
> #Server site http://www.mallofamerica.com/
> 
> #Realm [String|Regex] [Match|NoMatch] <arg> [alias]
> Realm http://*.mn.us/*
> #Realm http://*
> 
> #URL http://localhost/main/index.html
> ---/cut---
______________
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]
Re: UdmSearch: DeleteNoServer still broken in 3.1.9

Reply via email to