UdmSearch: space in url cause error

2001-02-16 Thread Caffeinate The World

when indexing (version 3.1.10), any URL's with a space (%20) will cause
the error:

Too many network errors for this server, skipped

but the URL does load fine in a browser.

...
Indexer[21663]: [1]
http://www.co.dakota.mn.us/socialservices/chcare/COMPLAINTS.htm
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/July%2025.htm
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/November%207.html
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1] http://www.co.dakota.mn.us/parks/ski%20pass.htm
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/November%2014.html
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/November%2021.html
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/November%2028.html
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/April%2025.htm
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/August%201.htm
Indexer[21663]: [1] Too many network errors for this server, skipped
Indexer[21663]: [1]
http://www.co.aitkin.mn.us/board%20minutes/2000/August%204.htm
Indexer[21663]: [1] Too many network errors for this server, skipped
...

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: no files found in mirror directories

2001-02-15 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 It tryed with your indexer.conf and everything work fine.
 Check that you have enough permissions to write to those directories.

one of the first thing i checked. i ran indexer as root so it shouldn't
matter. but yes write permission is there.

would it make any diffrence if the URLs where loaded into the db using
indexer -i -f urls.txt first. then i changed the indexer.conf to have
the mirror settings?

 
 Caffeinate The World wrote:
  
  --- Alexander Barkov [EMAIL PROTECTED] wrote:
   That's strange for me. I've just checked this config and
 everything
   work
   fine:
  
  
   DBAddr mysql://foo:bar@localhost/udm/
   MirrorRoot /usr/local/mnogosearch/var/mirror/
   Realm   http://localhost/*
   URL http://localhost/
  
  i've seen url's like *.mn.us/* being indexed, but still nothing in
 the
  mirror directories. this is very odd.
  
  
  
  
  
   Caffeinate The World wrote:
   
 Mirrors command must be used BEFORE Server commands, they are
 per-server
 command, so you can use different mirror location for
 different
 sites.
   
---
...
#MaxWordLength 32
#DeleteBad no
Index yes
Follow path
# store a copy of each pages locally
MirrorRoot /data/mnogosearch/mirror/pages
MirrorHeadersRoot /data/mnogosearch/mirror/headers
MirrorPeriod 6m
Server site http://www.state.mn.us/
Server site http://www.mnworkforcecenter.org/
Server site http://www.exploreminnesota.com/
Server site http://www.tpt.org/
Server page http://www.gorp.com/gorp/location/mn/mn.htm
Server path http://lists.rootsweb.com/index/usa/MN/
#Server site http://www.mallofamerica.com/
...
   
so it is before the server command. however i'm indexering from
 a
   list
of URL's which may not have a server command. is there a way to
   mirror
all URL's we index?  like a 'MirrorAll yes' or something.
   
in my case, i use a Realm *.mn.us/* to index arbitrary sites
 with
   that
match. there is no way to know in advance what the server is to
   provide
a mirror setting specifically for it.


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: no files found in mirror directories

2001-02-15 Thread Caffeinate The World


--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 
 
 Caffeinate The World skrev:
  
  
  i have indexer going but i see nothing in the mirror directories.
 when
  does it store the pages to the mirror directory?
 
 If your pages are already indexed, when you re-index with -a 
 indexer will check the headers and only download files that 
 have been modified since the last indexing. Thus, all pages 
 that are not modified will not be dowloaded and therefore not 
 mirrored either. To create the mirror you need to either 
 (a) start again with a clean database or (b) use the -m switch. 

i was indexing with a clean slate (sort of). i inserted about 500,000
urls into the db from an external file using 'indexer -i -f url.file'.
that was before i had any mirror options in 'indexer.conf'. after all
the URLs were inserted, and about 10K the URLs were indexed, i stopped
indexer, editted 'indexer.conf' to add the 'mirror' options. so
technically there are still 490K of URLs left that have never been
indexed. so atleast some of them would get mirrored.

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-15 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Caffeinate The World wrote:
  
  --- Alexander Barkov [EMAIL PROTECTED] wrote:
 Hello!
  
   We finally found a bug in cache.c. New version is in attachement.
   Everybody who has problems with splitter's crashes are welcome to
   test.
  
  should the 'tree' directory be removed? can we split the raw log
 files
  we have thus far or is re-indexing necessary?
 
 
 I hope it should work without having to remove tree directory.
 But better to remove it. It is safe to use old /raw  and /splitter
 files
 without having to reindex.

ok. what exactly was the bug?

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-15 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
   Hello!
 
 We finally found a bug in cache.c. New version is in attachement.
 Everybody who has problems with splitter's crashes are welcome to
 test. 

should the 'tree' directory be removed? can we split the raw log files
we have thus far or is re-indexing necessary?

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-15 Thread Caffeinate The World

i didn't get this error on my NetBSD/Alpha. compile was fine.
what system are you on?

--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 
 
 Alexander Barkov skrev:
  
 
  We finally found a bug in cache.c. New version is in attachement.
  Everybody who has problems with splitter's crashes are welcome to
 test.
  Please, give feedback!
 
 Oops. Something else is not OK: 
 
 cache.c:687:87: warning: #ifdef with no argument
 cache.c:692:87: warning: #ifdef with no argument
 cache.c:697:87: warning: #ifdef with no argument
 cache.c:702:87: warning: #ifdef with no argument
 cache.c: In function `UdmFindCache':
 cache.c:969: parse error before `?'
 cache.c:982: `real_num' undeclared (first use in this function)
 cache.c:982: (Each undeclared identifier is reported only once
 cache.c:982: for each function it appears in.)
 cache.c:994: `fd1' undeclared (first use in this function)
 cache.c:996: `group' undeclared (first use in this function)
 cache.c:1000: `group_num' undeclared (first use in this function)
 cache.c: At top level:
 cache.c:1011: initializer element is not constant
 cache.c:1011: warning: data definition has no type or storage class
 cache.c:1012: parse error before string constant
 cache.c:1013: parse error before string constant
 cache.c:1013: warning: data definition has no type or storage class
 cache.c:1014: redefinition of `ticks'
 cache.c:1011: `ticks' previously defined here
 cache.c:1014: initializer element is not constant
 cache.c:1014: warning: data definition has no type or storage class
 cache.c:1015: parse error before string constant
 cache.c:1015: warning: data definition has no type or storage class
 cache.c:1024: `i' undeclared here (not in a function)
 cache.c:1024: parse error before `.'
 cache.c:1030: register name not specified for `p'
 cache.c:1032: parse error before `if'
 cache.c:1035: `pmerg' undeclared here (not in a function)
 cache.c:1035: `pmerg' undeclared here (not in a function)
 cache.c:1035: warning: data definition has no type or storage class
 cache.c:1036: parse error before `'
 cache.c:1043: `k' undeclared here (not in a function)
 cache.c:1043: warning: data definition has no type or storage class
 cache.c:1044: parse error before `}'
 cache.c:1046: conflicting types for `p'
 cache.c:1030: previous declaration of `p'
 cache.c:1046: `pmerg' undeclared here (not in a function)
 cache.c:1046: warning: data definition has no type or storage class
 cache.c:1047: parse error before `'
 cache.c:1048: parse error before `-'
 cache.c:1058: warning: initialization makes integer from pointer
 without
 a cast
 cache.c:1058: warning: data definition has no type or storage class
 cache.c:1058: parse error before `}'
 cache.c:1061: redefinition of `ticks'
 cache.c:1014: `ticks' previously defined here
 cache.c:1061: initializer element is not constant
 cache.c:1061: warning: data definition has no type or storage class
 cache.c:1063: parse error before string constant
 cache.c:1071: warning: parameter names (without types) in function
 declaration
 cache.c:1071: conflicting types for `UdmGroupByURL'
 ../include/udm_searchtool.h:7: previous declaration of
 `UdmGroupByURL'
 cache.c:1071: warning: data definition has no type or storage class
 cache.c:1072: parse error before `}'
 make[1]: *** [cache.lo] Error 1
 make[1]: Leaving directory `/root/mnogosearch-3.1.10/src'
 make: *** [all-recursive] Error 1
 
 
 -- 
 oracle@everywhere: The ephemeral source of the eternal truth...
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: no files found in mirror directories

2001-02-14 Thread Caffeinate The World

i'm trying to store all web pages locally so i don't have to go fetch
them on the internet each time i re-index.

i have indexer going but i see nothing in the mirror directories. when
does it store the pages to the mirror directory?

# grep Mirror indexer.conf
MirrorRoot /data/mnogosearch/mirror/pages
MirrorHeadersRoot /data/mnogosearch/mirror/headers
MirrorPeriod 6m
# ls -l /data/mnogosearch/mirror/*
/data/mnogosearch/mirror/headers:

/data/mnogosearch/mirror/pages:



__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-14 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Alexander Barkov wrote:
  
   i completely forgot about this feature!!! i read about it when i
 first
   started using mnogosearch, but never bothered to use it.
  
   with mirror feature, wouldn't it be easy to implement Google's
 "cache"
   feature where the user can view a cache of the page from the last
 time
   you indexed.
  
  I think it's possible. Moreover, we may use zlib to compress those
  files,
  so they'll use less space.
 
 
 The only one disadvantage is that it will not work on huge
 search engines with millions documents. There is a limit on total
 file number on file system in most unixes.
 For example, my 30G /usr partition on FreeBSD box can create about 8
 mln
 files.

is that a per file system limit or per unix box limit?

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: no files found in mirror directories

2001-02-14 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 That's strange for me. I've just checked this config and everything
 work
 fine:
 
 
 DBAddr mysql://foo:bar@localhost/udm/
 MirrorRoot /usr/local/mnogosearch/var/mirror/
 Realm   http://localhost/*
 URL http://localhost/

it's not working on my system cause it hasn't index the urls with
*.mn.us/* yet. it's indexing other url's that was fed to it via an
external list (indexer -i -f url_list.txt). some of those url's in that
list doesn't fit the pattern '*.mn.us/*'. 

i could add 'Realm *' to get it to mirror any site, but that would tell
indexer to follow and index anything which is not what i want. what i'm
looking for is some parameter like DeleteNoServer but for mirroring.
where it would mirror all URLs already in the db or fed to it by an
external list. 

 Caffeinate The World wrote:
  
   Mirrors command must be used BEFORE Server commands, they are
   per-server
   command, so you can use different mirror location for different
   sites.
  
  ---
  ...
  #MaxWordLength 32
  #DeleteBad no
  Index yes
  Follow path
  # store a copy of each pages locally
  MirrorRoot /data/mnogosearch/mirror/pages
  MirrorHeadersRoot /data/mnogosearch/mirror/headers
  MirrorPeriod 6m
  Server site http://www.state.mn.us/
  Server site http://www.mnworkforcecenter.org/
  Server site http://www.exploreminnesota.com/
  Server site http://www.tpt.org/
  Server page http://www.gorp.com/gorp/location/mn/mn.htm
  Server path http://lists.rootsweb.com/index/usa/MN/
  #Server site http://www.mallofamerica.com/
  ...
  
  so it is before the server command. however i'm indexering from a
 list
  of URL's which may not have a server command. is there a way to
 mirror
  all URL's we index?  like a 'MirrorAll yes' or something.
  
  in my case, i use a Realm *.mn.us/* to index arbitrary sites with
 that
  match. there is no way to know in advance what the server is to
 provide
  a mirror setting specifically for it.
  
  __
  Do You Yahoo!?
  Get personalized email addresses from Yahoo! Mail - only $35
  a year!  http://personal.mail.yahoo.com/
  __
  If you want to unsubscribe send "unsubscribe udmsearch"
  to [EMAIL PROTECTED]


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-14 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Caffeinate The World wrote:
   The only one disadvantage is that it will not work on huge
   search engines with millions documents. There is a limit on total
   file number on file system in most unixes.
   For example, my 30G /usr partition on FreeBSD box can create
 about 8
   mln
   files.
  
  is that a per file system limit or per unix box limit?
  
 
 Per file system limit.

couldn't you do something like mount multiple FS:

sd0a /data/part1
sd1a /data/part2
...
sdna /data/partn

wouldn't that work?


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: no files found in mirror directories

2001-02-14 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 That's strange for me. I've just checked this config and everything
 work
 fine:
 
 
 DBAddr mysql://foo:bar@localhost/udm/
 MirrorRoot /usr/local/mnogosearch/var/mirror/
 Realm   http://localhost/*
 URL http://localhost/

i've seen url's like *.mn.us/* being indexed, but still nothing in the
mirror directories. this is very odd.

 
 
 
 
 Caffeinate The World wrote:
  
   Mirrors command must be used BEFORE Server commands, they are
   per-server
   command, so you can use different mirror location for different
   sites.
  
  ---
  ...
  #MaxWordLength 32
  #DeleteBad no
  Index yes
  Follow path
  # store a copy of each pages locally
  MirrorRoot /data/mnogosearch/mirror/pages
  MirrorHeadersRoot /data/mnogosearch/mirror/headers
  MirrorPeriod 6m
  Server site http://www.state.mn.us/
  Server site http://www.mnworkforcecenter.org/
  Server site http://www.exploreminnesota.com/
  Server site http://www.tpt.org/
  Server page http://www.gorp.com/gorp/location/mn/mn.htm
  Server path http://lists.rootsweb.com/index/usa/MN/
  #Server site http://www.mallofamerica.com/
  ...
  
  so it is before the server command. however i'm indexering from a
 list
  of URL's which may not have a server command. is there a way to
 mirror
  all URL's we index?  like a 'MirrorAll yes' or something.
  
  in my case, i use a Realm *.mn.us/* to index arbitrary sites with
 that
  match. there is no way to know in advance what the server is to
 provide
  a mirror setting specifically for it.
  
  __
  Do You Yahoo!?
  Get personalized email addresses from Yahoo! Mail - only $35
  a year!  http://personal.mail.yahoo.com/
  __
  If you want to unsubscribe send "unsubscribe udmsearch"
  to [EMAIL PROTECTED]
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: 3.1.10 Won't Make or Make install

2001-02-13 Thread Caffeinate The World


--- Adrift [EMAIL PROTECTED] wrote:
 Author: Adrift
 Email: [EMAIL PROTECTED]
 Message:
 every version of mysql I have installed worked perfectly, that is the
 install ran smoothly (I am using FreeBSD 3.4). When I tried to "MAKE"
 the new version of mnogosearch, 3.1.10, I got the error:
 
 Making all in src
 "Makefile", line 390: Need an operator

what does line 390 look like?


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: CAN I SEND SIGNAL 'TERM' TO INDEXER PROGRAM?

2001-02-13 Thread Caffeinate The World


--- Anonymous [EMAIL PROTECTED] wrote:
 Author: pokistu
 Email: 
 Message:
 Actually I am running indexer, but it is 'eating' a lot of system
 memory. I want to know if i can stop with the 'term' signal (linux)
 the indexer program, and the DATABASE will NOT corrupt.

i do it and i've not noticed any problems. i run postgresql.

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: cache mode and table dict

2001-02-13 Thread Caffeinate The World

is the table 'dict' used at all in cache mode? mine doesn't have any records.

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-13 Thread Caffeinate The World

i've been going through this and back again time and time again. what
would really be nice is indexer save the logs in a format that's easy
to use again. for instance, you can use the format re-index to sql etc.

or if you want to reindex again, you don't have to crawl through all
the external websites. saves a lot of time and we can debug faster.

--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 
 
 Zenon Panoussis skrev:
  
  
  Now for 31 MB adventures :)
 
 # ./run-splitter -k
 Sending -HUP signal to cachelogd...
 Done
 # ./run-splitter -p
 Preparing logs...
 Open dir '/var/mnogo3110/raw'
 Preparing word log 982024900  [   42176 bytes]
 Preparing word log 982027284  [31465324 bytes]
 Preparing word log 982027618  [ 8815804 bytes]
 Preparing del log 982024900  
 Preparing del log 982027284
 Preparing del log 982027618
 Renaming logs...
 Done
 
 Running ./run-splitter on these worked fine. No problems at all. 
 After that, I went on indexing and created 
 
 59920 Feb 13 06:05 982040748.del.done
  31457740 Feb 13 06:05 982040748.wrd.done
  1480 Feb 13 06:06 982040807.del.done
637240 Feb 13 06:06 982040807.wrd.done
 51920 Feb 13 07:21 982045300.del.done
  31469304 Feb 13 07:21 982045300.wrd.done
 69248 Feb 13 07:51 982047843.del.done
  30213344 Feb 13 07:51 982047843.wrd.done
 
 another two 31 MB files and two smaller ones. All of them were 
 splitted without problems.
 
 [two days later] 
 
 Indexing kept crashing (see separate posting) and splitting 
 kept going fine until tonight, when the opposite occured. 
 By now, I have almost 1 GB of indexed files, 4 indexer 
 crashes and one splitter crash. I'll do the debugging and 
 post its output tomorrow. 
 
 Z
 
 
 -- 
 oracle@everywhere: The ephemeral source of the eternal truth...
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-12 Thread Caffeinate The World

in my tests your 3 little files wouldn't make a difference. he would
have to run splitter -p and splitter on all the files starting from the
first original RAW file, including all the 31 MB file. i believe in my
case it was the original 31mb file which caused the problem. 

while processing the first 31mb file, it didn't core dump, but all the
preceeding files did cause core dumps at unpredictable times, but often
at the same location initially (i.e. 77C3000...)

therefore, in order to recreate the scenario, one would have to start
from the first raw file. i've tar-ed up such a series of file for Alex.
perhaps he'll be able to find out why. my hypothesis is an array or
buffer overflow in splitter.c.




--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 
 
 Alexander Barkov skrev:
  
 
  Can you guys give us a log file produced by splitter -p which
 caused
  crash? We can't reproduce crash :-(
 
 Huh? splitter doesn't accept the -v5 argument, so it won't give 
 more detailed logs than the normal ones. The only log I had, that 
 to stdout, is the one I included with my first posting in this 
 thread: 
 
   Delete from cache-file /var/mnogo319/tree/12/B/12BFD000 
 /var/mnogo319/tree/12/C/12C1 old: 69 new: 1 total: 70 
 ./run-splitter: line 118: 18790 Segmentation fault (core
 dumped) $SPLITTER 
 
 Until this point everything was normal. 
 
 Anyway, as I said, I strongly suspect corruption in the word 
 database. On a previous occasion when this happened, I deleted 
 the entire tree/* directory structure and started all over again. 
 Splitter worked like a dream with both small and big log files 
 until one of the following occured:
 
 1. I stopped indexer with ^C and then run splitter 
or
 2. Splitter had to work itself through some 31 MB files. (These 
files are not all the same size; they tend to get slightly 
bigger the more they are, i.e. something like this:
  0001.log31.500.000 bytes
  0002.log31.550.000 bytes 
  0003.log31.580.000 bytes
sort of). 
 
 Unfortunately I haven't been making notes, so I can't tell for 
 sure which one of these two things happened before things stopped 
 working. 
 
 I tried splitter again today with ./splitter splitter.log . It 
 went in a very normal way *almost* as far as yesterday, and then 
 hang so badly that not even kill -9 could kill it. The log of 
 this run looks like 
 
 snip normal operation
 Delete from cache-file /var/mnogo319/tree/12/B/12B27000
 Delete from cache-file /var/mnogo319/tree/12/B/12B2D000
 Delete from cache-file /var/mnogo319/tree/12/B/12B3
 Delete from cache-file /var/mnogo319/tree/12/B/12B31000
 Delete from cache-file /var/mnogo319/tree/12/B/12B3
 
 I am attaching the three files that could be involved, 
 namely tree/12/B/12B31000, 12B32000 and 12B35000. 
 
 
 I'll install 3.1.10 now, try it on the old word database and see 
 what it does. If it doesn't work, I'll remove the word database 
 and start again from scratch. I'll try to make detailed notes this 
 time and report back. 
 
 Z
 
 
 -- 
 oracle@everywhere: The ephemeral source of the eternal truth...

 ATTACHMENT part 2 application/x-gzip name=wordfiles.tar.gz



__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: This is SHITE!!!

2001-02-12 Thread Caffeinate The World

you can try http://aspseek.com i think that's the other one based on
mnogo. or was it aspsearch.com? argh i forgot. there is also htdig.

check em out. as far as mnogo.. no one is getting paid for development
here. people spend their time coding and releasing it for free. yes i
agree, the docs could use some help, but then English aren't these
guys' native tongue. in addition, if anyone want to write the docs, by
all means no one is stopping you from contributing.

i know at times it can be frustrating when things don't seem to go
right. but you have to be patient because there are many bugs, many
platforms, variations of OSes, etc. i've been describing a bug in cache
mode for over a month, and they are working on tracking it down. it's
hard when i see it on my system but they can't reproduce it. as of
late, a few others cited the same bug in cache mode. see, i've been
patient for over a month. during that time, i've tried different
scenarios to help narrow down the problem. 

my point is, i'm not paying these guys, and so i don't really have any
rights to whine about it. the best i can do is to try to help with what
ever effort i can contribute. but if i can't, i just try to be patient
while they do their work. 

posting a comment such as yours won't help get things working for you.
in fact, it may hinder your progress and others as it may piss them
off. though we all get frustrated at times, but try to describe your
problems in a detailed manner, and be patient. alex, serge, and others
have been more than generous with their time and efforts, but like
everything else in life that's free, there is no guarantee.

--- Anonymous [EMAIL PROTECTED] wrote:
 Author: Joe B
 Email: 
 Message:
 Hello ALL,
 
 After spending nearly 3 Days trying to get this thing to work, I have
 come to the conclusion that it is a waste of time and a JOKE:-(..)
 
 The documentation is poor and the support I am getting from this
 board is daft. Does anyone else no of any alternative? If so please
 let me know. 
 
 See my postings below to see problems I have been having and the
 replies  i get and you will see why I am feeling this way.
 
 
 
 Reply: http://search.mnogo.ru/board/message.php?id=1335
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-11 Thread Caffeinate The World

i reported this problems a while back. i believe it's being worked on.
atleast the recently found the bug why it wasn't splitting out to FFF.
the seg fault happens during the splitter process and not index. i've
been splitter when the logs are at about  2 MB and i've not had
splitter core dump on me yet. but before when i let the log file build
up to about 15 to 30 MB, i had that core dump problem.

i hope this will be resolved soon because it's a pain in the behind.
;-(
--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 Author: Zenon Panoussis
 Email: [EMAIL PROTECTED]
 Message:
 RH Linux 7.0, search 3.1.9, MySQL 3.23.29, cache mode, with the 
 new patches for cache.c and sql.c. 
 
 It happens all the time. It started happening when "maximum size" 
 31 MB log files were indexed, but by now it happens on any indexing, 
 no matter how big or small the log file, as if the database somehow 
 was corrupt:
 
   Delete from cache-file /var/mnogo319/tree/12/B/12BFD000
   /var/mnogo319/tree/12/C/12C1 old:  69 new:   1 total:  70
   ./run-splitter: line 118: 18790 Segmentation fault  (core
 dumped) $SPLITTER
 
 For the same log file it always crashes at the same index file 
 (e.g. every time I try to reindex 12345678.log it will crash 
 at tree/12/3/4567000). If I delete the log file and start again 
 with a new log file, it will crash at a different place, but it 
 will still be consistent in crashing at the same place every time. 
 
 And the backtrace:
 
 # gdb splitter core
 GNU gdb 5.0
 [...]
 This GDB was configured as "i386-redhat-linux"...
 Core was generated by `/usr/local/mnogo319/sbin/splitter'.
 Program terminated with signal 11, Segmentation fault.
 Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done.
 Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10
 Reading symbols from /lib/libm.so.6...done.
 Loaded symbols for /lib/libm.so.6
 Reading symbols from /usr/lib/libz.so.1...done.
 Loaded symbols for /usr/lib/libz.so.1
 Reading symbols from /lib/libc.so.6...done.
 Loaded symbols for /lib/libc.so.6
 Reading symbols from /lib/libcrypt.so.1...done.
 Loaded symbols for /lib/libcrypt.so.1
 Reading symbols from /lib/libnsl.so.1...done.
 Loaded symbols for /lib/libnsl.so.1
 Reading symbols from /lib/ld-linux.so.2...done.
 Loaded symbols for /lib/ld-linux.so.2
 #0  0x8059061 in UdmSplitCacheLog (log=300) at cache.c:552
 552  
   logwords[count+j].wrd_id=table[w].wrd_id;
 
 (gdb) backtrace
 #0  0x8059061 in UdmSplitCacheLog (log=300) at cache.c:552
 #1  0x8049e89 in main (argc=1, argv=0xba94) at splitter.c:70
 #2  0x4009bbfc in __libc_start_main (main=0x8049d80 main, argc=1,
 ubp_av=0xba94, 
 init=0x80495bc _init, fini=0x8065b7c _fini,
 rtld_fini=0x4000d674 _dl_fini, stack_end=0xba8c)
 at ../sysdeps/generic/libc-start.c:118
 
 Since 3.1.10 is coming out today, I'll try it and see if things 
 work better. If not, I'll post more bad news later ;)
 
 Z
 
 
 
 Reply: http://search.mnogo.ru/board/message.php?id=1320
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Splitter: core dumped

2001-02-07 Thread Caffeinate The World

i've had this problem since they implemented cache mode. i've written
about it several times (in detail). however, it appears that no one
knows what it is. at first i thought it was my alpha 'til your email.
maybe alex or serge can help. i've also provided back traces as well.
i'll wait. for now, i'm indexing but running splitter when the files
are around 2MB. 

--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 
 Caffeinate The World skrev:
  
 
   I run splitter -p and finish fine. I then run splitter and,
   halfway through the splitting, crash: segmentation fault, or
   just a hang, core dumped. So I restart splitter and next time
   finish fine.
 
  what machine are you on? Alpha? OS?
 
 Intel PII, RH Linux 7.0 with 2.2 kernet.
 
  
  i had the same problem and i sent a message to the mailing list
  describing how i corrected it.  search for "core" and "splitter"
 
 Found it. My dump appeared at a different position than yours, 
 at 076, but was just as persistent at yours. Also, the premises 
 are similar: I had run indexer for a long time and I had five 
 31 MB files waiting to be split. Splitter choked every time on 
 the third one of them. This has never happened before or after 
 when the logs have been smaller than 31 MB, so I'm just re-running 
 smaller chunks at a time.
 
 
  can you check another thing? i've never seen my splitter split the
  lasta file "FFF.log". do you get that file? it goes as high as
 FFE.log
  only.
 
 Indeed, last night I saw it stop at FFE.log . But I have had files 
 at tree/FF/F/... , so I assume that other times it went all the 
 way to FFF. 
 
 Z
 
 
 -- 
 oracle@everywhere: The ephemeral source of the eternal truth...
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices.
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: slow indexing

2001-02-07 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Author: Alexander Barkov
 Email: [EMAIL PROTECTED]
 Message:
  I have a question.  I have a servertable of about 20,000 urls, I
 was wandering if that was what my performance bottleneck is.  It
 seems that Indexer takes all the cpu time on my machine now, and only
 indexes about 20 urls per 10 minutes.  Out of those 20,000 servers, I
 have 690K documents.
  My indexer.conf file is basically:
  
  allownocase *.htm *.html *.pl 
  Robots yes
  DeleteNoServer no
  Deletebad yes
  follow world
  hops 3
  Dbmode crc-multi
  servertable server
  
 
 It is known problem. We think how to solve it. For now use Realm
 command
 where it is possible. It allows to describe several sites or even
 domains using the only one command.
 
 
 Reply: http://search.mnogo.ru/board/message.php?id=1279
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 
when i had Realm set to *, it just flies!! but i also ended up getting
urls i didn't want.


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices.
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Splitter: core dumped

2001-02-07 Thread Caffeinate The World

i have a gig of ram here on my alpha. the only thing that have helped
me was to change size_t to u_int32_t in cache.c and size_t to 'unsigned
int' in cachelogd.c. however the size is a 64bit related problem with
the alpha. but with those changes and splitting in smaller increments,
i keep it below 5mb file before i splitter it. things are fine. well
with the exception that i NEVER see FFF.log. so i think there is
something that's not right with the calculations. i've indexed over a
million URLs at one point and never once got the FFF.log.


--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 
 Caffeinate The World skrev:
  
 
 
  i'll wait. for now, i'm indexing but running splitter when the
 files
  are around 2MB.
 
 I've been running indexer -c 3600 since last night, producing 
 log files of 5-10 MB and running splitter every time afterwards, 
 with cleaning of var/splitter and all. So far no problems at all. 
 I have a hunch that the problem is to splitting multiple big 
 files in one go. 
 
 A friend offered to lend me some memory. If I can get my ass 
 over there and fetch it, I'll try a huge splitting first with 
 my standard 128 MB RAM and then with 1 GB RAM. If there is any 
 difference in the behaviour of splitter, it will be a good 
 indication of where to look for the problem.
 
 Z
 
 
 -- 
 oracle@everywhere: The ephemeral source of the eternal truth..


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices.
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Splitter: core dumped

2001-02-06 Thread Caffeinate The World

what machine are you on? Alpha? OS?

i had the same problem and i sent a message to the mailing list
describing how i corrected it.  search for "core" and "splitter"

can you check another thing? i've never seen my splitter split the
lasta file "FFF.log". do you get that file? it goes as high as FFE.log
only.

--- Zenon Panoussis [EMAIL PROTECTED] wrote:
 Author: Zenon Panoussis
 Email: [EMAIL PROTECTED]
 Message:
 
 [3.1.9, cache mode]
 
 I run splitter -p and finish fine. I then run splitter and, 
 halfway through the splitting, crash: segmentation fault, or 
 just a hang, core dumped. So I restart splitter and next time 
 finish fine. 
 
 The question is: what can this do to the word database? Will 
 it still be accurate, or will some words be inserted twice? 
 Can I just re-run and finish and be happy, or should I re-index? 
 
 Z
 
 
 Reply: http://search.mnogo.ru/board/message.php?id=1271
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices.
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[8]: UdmSearch: php-mnogo

2001-02-02 Thread Caffeinate The World


--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
 Friday, February 02, 2001, 8:57:28 AM, you wrote:
 
 CTW i modified libtool a bit and it compiled and apache didn't
 complain.
 CTW i'll try making a sharedlib later. but upon testing it. it's
 VERY FAST.
 
 I have a question: have you compiled threaded version of mnogosearch
 or not ?

no i've not. i'm on netbsd, we don't have native threads yet.

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Cache mode questions

2001-02-02 Thread Caffeinate The World

http://search.freewinds.cx/cgi-bin/search2.cgi

--- Alexander Barkov [EMAIL PROTECTED] wrote:
 What "New search" do you mean guys?
 I can't find it on this page.
 
 
 Caffeinate The World wrote:
  
  oops ignore my last post, i forgot to use New Search. yes you are
  right. wow. yikes i mean. i don't know what's going on there.
  
  i don't think substring search is supported in cache mode.
  
  --- Zenon Panoussis [EMAIL PROTECTED] wrote:
   Author: Zenon Panoussis
   Email: [EMAIL PROTECTED]
   Message:
  
 The search works very nicely, but it returns a tremendous
 amount of quoted document data...
  
Can I take a look on your search page?
  
   Yes. Go to http://search.freewinds.cx and use "New search".
   Search for the word "something" and format "Long" and you'll
   get a results page that's almost half a megabyte.
  
   BTW, there is some other strange behaviour there. Searching
   for beginning of word or substring doesn't work at all. Ispell
   is not enabled, but as I understand it doesn't need to be either.


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: shared lib uses wrong path

2001-02-02 Thread Caffeinate The World

when using:

--enable-shared

all client programs of mnogosearch looks for their library in ".libs"
instead of "$PREFIX/lib"

# ./search.cgi
Cannot open ".libs/libudmsearch.so"

# indexer -h
Cannot open ".libs/libudmsearch.so"

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: php-mnogo

2001-02-02 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Does search.cgi work fine with  "Minneapolis Elected Officials" ?

no search.cgi doesn't. i appears as though pluralized words aren't
searched properly. ie. "official" would work, but not "officials". in
addition, any capitalization will not work.  i do have "IspellMode db"
and "StopwordTable stopword" set. seems like some problems with suffix
and ispell mode. i'm using 3.1.9.


 
 Caffeinate The World wrote:
  
  i modified libtool a bit and it compiled and apache didn't
 complain.
  i'll try making a sharedlib later. but upon testing it. it's VERY
 FAST.
  i'm using cache mode and it's a few folds faster than the CGI
 version.
  also my db is pgsql. when i say fast, i mean REALLY REALLY FAST.
 and
  this is with a server load of about 4.. average load is about 1 or
 .90.
  i'm running serveral indexers now.
  
  something that is strange is this:
  
  http://search.minnesota.com/test.php
  search for these words: city council minneapolis official
  
  first entry will be:
  
  ---cut---
1. http://www.ci.minneapolis.mn.us/citywork/elected.html
   CONT : text/html
   TITLE: Minneapolis Elected Officials
   KEYWORDS: Minneapolis, City of Minneapolis, Minnesota, Twin
  Cities, City of Lakes, City Government, MN, MPLS, Municipal
 Government,
  Municipality, Local Government, Govern
   DESC: This is the official web site for the City of
 Minneapolis,
  Minnesota, USA. As a round-the-clock ser
   TEXT: Minneapolis Elected Officials View the City of
  Minneapolis Goals   1999 Goals  2000 Goals  2001 GoalsMayor 
 Sharon
  Sayles BeltonCouncil Members About the City Council
 (roles
  and responsibilities)Ward 1 - Paul Ostrow  Ward 2 - Joan
   SIZE : 9456
   MODIFIED : 979051812
   URLID : 517899
   SCORE : 4
  ---/cut---
  
  look at the line "TEXT:"
  
  now if you use "Minneapolis Elected Officials" as your new search
  words, it will return 0 documents found. why?
  
  one thing to note is i had to wipe out my ./var/tree, keep the URLs
 in
  the db, expire all of the URLs, and 'indexer -m' to reindex them.
 this
  process is VERY slow.. it seems as though it's 20 times slower than
  when i initially started w/o any URLs in the DB yet, just "server"
  commands. currently there are about 1/2 million urls in there,
 about
  10,000 has been indexed.
  
  --- Sergey Kartashoff [EMAIL PROTECTED] wrote:
   Hi!
  
   Friday, February 02, 2001, 8:17:21 AM, you wrote:
  
   CTW i took out "-ludmsearch" from LIBS. recompiled:
  
   CTW those functions are still Undefined. for some reason the
   warnings seem
   CTW ti indicate that it's looking for a shared libudmsearch.so?
  
   ok, we will discuss about this problem.
   Maybe this is because of you are using -export-dynamic in your
   ldflags. Anyway, you can try to compile/install libudmsearch as
   shares library
   by using --enable-shared configure switch whhile configuring
   mnogosearch. Try reinstall it as shared library and
   reconfigure/recompile/reinstall php.
  
   --
   Regards, Sergey aka gluke.
  
  
  
  __
  Get personalized email addresses from Yahoo! Mail - only $35
  a year!  http://personal.mail.yahoo.com/
  __
  If you want to unsubscribe send "unsubscribe udmsearch"
  to [EMAIL PROTECTED]


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[2]: UdmSearch: php-mnogo-0.6

2001-02-02 Thread Caffeinate The World


--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
 Friday, February 02, 2001, 6:20:39 PM, you wrote:
 
 CTW this doesn't yet support ispell suffix or prefix mode does it? 
 
 No, it will be done soon.
 
 CTW maybe this is why searches fail on pluralized words. also,
 search will
 CTW fail on any words with one or more letters capitalized.
 
 This is strange. Have you setup UDM_PARAM_CHARSET correctly ?

Udm_Set_Agent_Param($udm,UDM_PARAM_CHARSET,"iso-8859-1");   

btw, what is the default charset if you don't specify any in for
indexer.conf?


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[4]: UdmSearch: php-mnogo-0.6

2001-02-02 Thread Caffeinate The World


--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
 Friday, February 02, 2001, 7:23:58 PM, you wrote:
 
  CTW maybe this is why searches fail on pluralized words. also,
  search will
  CTW fail on any words with one or more letters capitalized.
  
  This is strange. Have you setup UDM_PARAM_CHARSET correctly ?
 
 CTW Udm_Set_Agent_Param($udm,UDM_PARAM_CHARSET,"iso-8859-1");   

 
 ok, i checked it. It is really the bug. (With not finding zapitalized
 words). I will try to fix it. Thank you !

glad you hear! good job! if you don't mind my ranting, i'm here to find
bugs ;-) just don't let my abundance of postings get to you. thanks and
keep up the wonderful work. btw, it must be rather late in your country
now? it's noon here. sleep does the body good. i tried it last night,
and i feel better today.

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




solution (Re: UdmSearch: splitter core dump)

2001-02-02 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 No, unfortunately we haven't found bug yet :-(
 Your last debug information should help.

it appears i was right about size_t causing the problem on Alpha.
Again, sizeof(size_t) on the Alpha is 8, while on i386 it's 4. i'm not
exactly sure what contributed to the coredump, but my feeling is that
because of the difference in size, cachelogd wrote the wrong records to
disk only for certain words. the size difference also affects splitter.
with this bug, record 77C.log always had the largest size. again, i'm
not sure why. why most other records were less than 10K for each
splitter time, 77C.log ranged from 100K to over 400K.

i started over from scratch by re-indexing everything. i tried using
the changed splitter and cachelogd on the existing ./var/tree data, but
it caused more core dumps, not at 77C but at other locations. i believe
the existing data were tainted, and therefore when checked by splitter
for comparison or delete processes, it core dumped.

but before that i made some changes to "cache.c",  "cachelogd.c". in
"cache.c" i replaced all occurances of "size_t" with "u_int32_t". for
"cachelogd.c" i replaced all "size_t" with "unsigned int". please note
that replacing "size_t" with "u_int32_t" for "cachelogd.c" will result
in extremely high and always increasing server load. mine went from 1
to over 36 for server load after trying that.

after erasing ./var/tree (is there a faster way than rm -rf) and
starting the new cachelogd, i started indexer. i've been running it for
3 days. i've used splitter 4 times and i've yet to get a core dump.
i've tested this on raw data of about 2mb or less. i've not let it
climb to 30mb like before. i'll do that soon here, but the indexing
process is extremely slow (4 indexers running, not threaded). maybe
it's because of the 1/2 million expired urls in pgsql's db.
 
 Caffeinate The World wrote:
  
  hi alex,
  
  could you let me know if you found anything and if you
  have a patch for 3.1.9pre13. i have indexers still
  going and just building up files and i can't splitter
  them large files unless i attend to the computer and
  watch the size of the logs. thanks.
  
  --- Alexander Barkov [EMAIL PROTECTED] wrote:
   We are trying to discover this bug now.
  
  
   Caffeinate The World wrote:
   
mnogosearch 3.1.9-pre13, pgsql 7.1-current,
netbsd/alpha 1.5.1-current
   
running cachemode. i've been indexing and
   splitter-ing
just fine. 'til today when after an overnight of
indexers running and gathering up a log file  of
   over
31 MB, cachelogd automatically started a new log
   file.
   
i ran 'splitter -p' on that 31 MB log file. it was
split up just fine. then i ran 'splitter' and it
   core
dumped almost half way thru.
   
cut
...
Delete from cache-file
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000
Delete from cache-file
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
old:   2 new:   4 total:   6
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
old:   0 new:   1 total:   1
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
old:   0 new:   2 total:   2
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
old:   0 new:   1 total:   1
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000
old:   1 new:   1 total:   2
   
  
  /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3
old:27049 new:13718 total:40767
Segmentation fault - core dumped
/cut
   
here is the backtrace:
   
cut
...
#0  0x120018c44 in UdmSplitCacheLog (log=
Cannot access memory at address 0x121f873bc.
) at cache.c:591
591
 table[header.ntables].pos=pos;
(gdb) bt
#0  0x120018c44 in UdmSplitCacheLog (log=
Cannot access memory at address 0x121f873bc.
) at cache.c:591
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address
0xc712f381000470e1
/cut
   
sorry i don't think i compiled splitter with debug
flag on so i don't have much more info.
   
here is the filesizes:
   
-rw-r--r--  1 root  wheel   4 Jan 14 10:56
   77A.log
-rw-r--r--  1 root  wheel   11732 Jan 14 10:56
   77B.log
-rw-r--r--  1 root  wheel  465360 Jan 14 10:56
   77C.log
   ^^
   ^^^
-rw-r--r--  1 root  wheel   73696 Jan 14 10:56
   77D.log
-rw-r--r--  1 root  wheel   22764 Jan 14 10:56
   77E.log
   
notice 77C.log, that's where it core dumped. it's
unusually large.
   
i think there is a bug in splitter. how do i
   continue
with the splitter process at this point so that
77C.log and others get processed?


_

Re: UdmSearch: php-mnogo

2001-02-01 Thread Caffeinate The World

was the problem which caused a segmentation fault fixed?

--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
   Here is the php4 extension module which adds native libudmsearch
   functions support for php. We uploaded it at the PHP CVS source
   tree, so it is expected that this module will be included in the
   next 4.0.5 release of php.
 
   This module currently at the appha state, but it is working and can
   be used already. Now in contains only basic support of libudmsearch
   features. But we have working on it and updating it at the PHP CVS
   source tree.
 
   I sending into list module in its current state. Please feel free
 to
   report any bugs you will find. Documentation about this module
   currently is unavaiable, it will be done after some time. All
   functions it supports are given in the test.php example script.
 
   Installation instructions:
   1. create ext/mnogosearch directory at the php sources
   2. unpack all files from this tarball at the ext/mnogosearch
   3. delete configure and main/php_config.h.in scripts from php
  sources
   4. run buildconf script to recreate configure,  makefile templates
  and php_config.h.in files
   5. Now you can run configure --with-mnogosearch=dir
 --with-mysql=dir
  ... according to your needs.
   6. # make; make install
   7. Test php-mnogo functions with test.php script located in the
  tarball
 
 -- 
 Regards, Sergey aka gluke.

 ATTACHMENT part 2 application/x-compressed name=php-mnogo-0.5.tgz



__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[4]: UdmSearch: php-mnogo

2001-02-01 Thread Caffeinate The World

i took out "-ludmsearch" from LIBS. recompiled:

...
gmake[1]: Entering directory
`/home/staffs/t/tom/work/php/php4-current/php4'
/bin/sh /home/staffs/t/tom/work/php/php4-current/php4/libtool --silent
--mode=compile gcc  -I. -I
/home/staffs/t/tom/work/php/php4-current/php4/
-I/home/staffs/t/tom/work/php/php4-current/php4/ma
in -I/home/staffs/t/tom/work/php/php4-current/php4
-I/usr/pkg/include/httpd -I/home/staffs/t/tom/
work/php/php4-current/php4/Zend -I/usr/pkg/include/freetype
-I/usr/pkg/include -I/usr/local/inclu
de -I/usr/local/include/mysql -I/usr/local/install/Sablot-0.44/include
-I/home/staffs/t/tom/work/
php/php4-current/php4/ext/xml/expat/xmltok
-I/home/staffs/t/tom/work/php/php4-current/php4/ext/xm
l/expat/xmlparse -I/home/staffs/t/tom/work/php/php4-current/php4/TSRM 
-DNETBSD -DEAPI -DUSE_EXPA
T -I/usr/pkg/include -DXML_BYTE_ORDER=12 -g -O2  -c stub.c
/bin/sh /home/staffs/t/tom/work/php/php4-current/php4/libtool --silent
--mode=link gcc  -I. -I/ho
me/staffs/t/tom/work/php/php4-current/php4/
-I/home/staffs/t/tom/work/php/php4-current/php4/main
-I/home/staffs/t/tom/work/php/php4-current/php4
-I/usr/pkg/include/httpd -I/home/staffs/t/tom/wor
k/php/php4-current/php4/Zend -I/usr/pkg/include/freetype
-I/usr/pkg/include -I/usr/local/include
-I/usr/local/include/mysql -I/usr/local/install/Sablot-0.44/include
-I/home/staffs/t/tom/work/php
/php4-current/php4/ext/xml/expat/xmltok
-I/home/staffs/t/tom/work/php/php4-current/php4/ext/xml/e
xpat/xmlparse -I/home/staffs/t/tom/work/php/php4-current/php4/TSRM 
-DNETBSD -DEAPI -DUSE_EXPAT -
I/usr/pkg/include -DXML_BYTE_ORDER=12 -g -O2  -Wl,-export-dynamic
-Wl,-R/usr/lib -L/usr/lib -Wl,-
R/usr/pkg/lib -L/usr/pkg/lib -Wl,-R/usr/local/lib -L/usr/local/lib
-Wl,-R/usr/X11R6/lib -L/usr/X1
1R6/lib -o libphp4.la -rpath
/home/staffs/t/tom/work/php/php4-current/php4/libs -avoid-version -L
/usr/pkg/lib -L/usr/local/install/pgsql-current/lib
-L/usr/local/install/mnogosearch-3.1.9/lib -L
/usr/local/lib/mysql -L/usr/local/lib
-L/usr/local/install/Sablot-0.44/lib -Wl,-export-dynamic -W
l,-R/usr/lib -L/usr/lib -Wl,-R/usr/pkg/lib -L/usr/pkg/lib
-Wl,-R/usr/local/lib -L/usr/local/lib -
Wl,-R/usr/X11R6/lib -L/usr/X11R6/lib -R /usr/pkg/lib -R
/usr/local/install/pgsql-current/lib -R /
usr/local/install/mnogosearch-3.1.9/lib -R /usr/local/lib/mysql -R
/usr/local/lib -R /usr/local/i
nstall/Sablot-0.44/lib stub.lo  Zend/libZend.la sapi/apache/libsapi.la
main/libmain.la  ext/gd/li
bgd.la ext/mnogosearch/libmnogosearch.la ext/mysql/libmysql.la
ext/pcre/libpcre.la ext/pgsql/libp
gsql.la ext/posix/libposix.la ext/sablot/libsablot.la
ext/session/libsession.la ext/sockets/libso
ckets.la ext/standard/libstandard.la ext/sysvsem/libsysvsem.la
ext/sysvshm/libsysvshm.la ext/xml/
libxml.la ext/zlib/libzlib.la TSRM/libtsrm.la -lz -lxmltok -lxmlparse
-lsablot -lpq -lmysqlclient
 -ludmsearch -lpq -lcrypt -lttf -lpng -lz -lgd -lresolv -lm -lcrypt
-lgd -lpng -lz -lm -lc -lpng
-ljpeg -lz -lttf -lintl -lXpm -lX11 -lresolv -lgcc

*** Warning: This library needs some functionality provided by
-ludmsearch.
*** I have the capability to make that library automatically link in
when
*** you link to this library.  But I can only do this if you have a
*** shared version of the library, which you do not appear to have.
*** Warning: This library needs some functionality provided by -lgcc.
*** I have the capability to make that library automatically link in
when
*** you link to this library.  But I can only do this if you have a
*** shared version of the library, which you do not appear to have.

*** Warning: This library needs some functionality provided by
-ludmsearch.
*** I have the capability to make that library automatically link in
when
*** you link to this library.  But I can only do this if you have a
*** shared version of the library, which you do not appear to have.

*** Warning: This library needs some functionality provided by -lgcc.
*** I have the capability to make that library automatically link in
when
*** you link to this library.  But I can only do this if you have a
*** shared version of the library, which you do not appear to have.
*** The inter-library dependencies that have been dropped here will be
*** automatically added whenever a program is linked with this library
*** or is declared to -dlopen it.
gmake[1]: Leaving directory
`/home/staffs/t/tom/work/php/php4-current/php4'
Making all in pear
gmake[1]: Entering directory
`/home/staffs/t/tom/work/php/php4-current/php4/pear'
gmake[1]: Leaving directory
`/home/staffs/t/tom/work/php/php4-current/php4/pear'
# cd .libs
# ls
libphp4.la  libphp4.lai libphp4.so
# ls -l
total 4178
lrwxr-xr-x  1 root  users   13 Feb  2 00:11 libphp4.la -
../libphp4.la
-rw-r--r--  1 root  users 1447 Feb  2 00:11 libphp4.lai
-rwxr-xr-x  1 root  users  4266206 Feb  2 00:11 libphp4.so
# nm *.so | grep Udm
 U UdmAllocAgent
 U UdmAllocEnv
 U UdmDBErrorCode
 U UdmDBErrorMsg
 U 

Re: Re[2]: UdmSearch: php-mnogo

2001-02-01 Thread Caffeinate The World


--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
 CTW was the problem which caused a segmentation fault fixed?
 
 The problem was the mysql library bundled with php.
 If you compile php with --with-mysql it uses its own library to
 access
 mysql. And if you compile it with --with-mysql=DIR, than it uses
 native mysqlclient library. If you will use native mysql library than
 everything should be ok.

i just recompiled the Apache 1.3.12 PHP module with PHP from CVS as of
today. it was compiled with:

export LIBS="-ludmsearch -lgd -lpng -lz -lm -lc -lpng -ljpeg -lz -lttf
-lintl -lXpm -lX11"  \
export LDFLAGS="-Wl,-export-dynamic -Wl,-R/usr/lib -L/usr/lib
-Wl,-R/usr/pkg/lib \
-L/usr/pkg/lib -Wl,-R/usr/local/lib -L/usr/local/lib
-Wl,-R/usr/X11R6/lib -L/usr/X11R6/lib"

./configure \
--with-apxs \
--with-sablot=/usr/local/install/Sablot-0.44 \
--with-mnogosearch=/usr/local \
--with-pgsql=/usr/local \
--with-mysql=/usr/local \
--enable-libgcc \
--with-gnu-ld \
--with-zlib \
--with-system-regex \
--with-config-file-path=/usr/local/etc \
--enable-track-vars \
--enable-force-cgi-redirect \
--enable-discard-path \
--enable-memory-limit \
--enable-sysvsem \
--enable-sysvshm \
--enable-sockets \
--with-gd=/usr/pkg \
--with-ttf=/usr/pkg \
--enable-freetype-4bit-antialias-hack

mnogosearch is 3.1.9, php-monogo is 0.5 from your email to the list.
upon restarting apache, i get:

  Undefined symbol UdmFreeAgent

and apache didn't start up.

# nm libphp4.so | grep Udm
 U UdmAllocAgent
 U UdmAllocEnv
 U UdmDBErrorCode
 U UdmDBErrorMsg
 U UdmEnvSetDBAddr
 U UdmEnvSetDBMode
 U UdmFind
 U UdmFreeAgent
 U UdmFreeResult
 U UdmGetCharset
 U UdmInit

note they are all undefined. strange. 

# ls /usr/local/lib/libudm*
/usr/local/lib/libudmsearch.a   /usr/local/lib/libudmsearch.la

so these should have been statically linked.

# nm /usr/local/lib/libudmsearch.a | grep UdmFreeAgent
01a0 T UdmFreeAgent

Please note that was able to compile php-mnogo-0.1 and apache started
fine and i was able to run the test.php script before. i used the exact
same procedures. i don't know why i'm seeing Undefined symbol for v0.5.
Also, v0.1 did segmentation fault on me. 

NetBSD/Dec-Alpha 1.5
mnogosearch 3.1.9
php-mnogosearch 0.5

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[6]: UdmSearch: php-mnogo

2001-02-01 Thread Caffeinate The World

i modified libtool a bit and it compiled and apache didn't complain.
i'll try making a sharedlib later. but upon testing it. it's VERY FAST.
i'm using cache mode and it's a few folds faster than the CGI version.
also my db is pgsql. when i say fast, i mean REALLY REALLY FAST. and
this is with a server load of about 4.. average load is about 1 or .90.
i'm running serveral indexers now.

something that is strange is this:

http://search.minnesota.com/test.php
search for these words: city council minneapolis official

first entry will be:

---cut---
  1. http://www.ci.minneapolis.mn.us/citywork/elected.html
 CONT : text/html
 TITLE: Minneapolis Elected Officials
 KEYWORDS: Minneapolis, City of Minneapolis, Minnesota, Twin
Cities, City of Lakes, City Government, MN, MPLS, Municipal Government,
Municipality, Local Government, Govern
 DESC: This is the official web site for the City of Minneapolis,
Minnesota, USA. As a round-the-clock ser
 TEXT: Minneapolis Elected Officials View the City of
Minneapolis Goals   1999 Goals  2000 Goals  2001 GoalsMayor  Sharon
Sayles BeltonCouncil Members About the City Council (roles
and responsibilities)Ward 1 - Paul Ostrow  Ward 2 - Joan
 SIZE : 9456
 MODIFIED : 979051812
 URLID : 517899
 SCORE : 4
---/cut---

look at the line "TEXT:"

now if you use "Minneapolis Elected Officials" as your new search
words, it will return 0 documents found. why?

one thing to note is i had to wipe out my ./var/tree, keep the URLs in
the db, expire all of the URLs, and 'indexer -m' to reindex them. this
process is VERY slow.. it seems as though it's 20 times slower than
when i initially started w/o any URLs in the DB yet, just "server"
commands. currently there are about 1/2 million urls in there, about
10,000 has been indexed.

--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
 Friday, February 02, 2001, 8:17:21 AM, you wrote:
 
 CTW i took out "-ludmsearch" from LIBS. recompiled:
 
 CTW those functions are still Undefined. for some reason the
 warnings seem
 CTW ti indicate that it's looking for a shared libudmsearch.so?
 
 ok, we will discuss about this problem.
 Maybe this is because of you are using -export-dynamic in your
 ldflags. Anyway, you can try to compile/install libudmsearch as
 shares library
 by using --enable-shared configure switch whhile configuring
 mnogosearch. Try reinstall it as shared library and
 reconfigure/recompile/reinstall php.
 
 -- 
 Regards, Sergey aka gluke.
 
 


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: mnogo what does it mean?

2001-01-30 Thread Caffeinate The World

it's been bugging me for sometime now.. what does mnogo stand for? or what does
it mean?

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter still core dumps on 3.1.9

2001-01-30 Thread Caffeinate The World

that was a little premature on my part. it did core dump again at 77C
when i tried to split another log file. argh.

--- Caffeinate The World [EMAIL PROTECTED] wrote:
 overnight, the "new splitter" using "u_int32_t" was able to split a
 log
 file around 31MB. this is the first time i've seen it able to index
 the
 log at 77C. can you verify that linux and such have "u_int32_t"? if
 it's does, i'll submit my patch. this should fix the problem with the
 alpha. also the patch should enable NetBSD to compile cleanly cause
 we
 don't have native threads yet.
 
 i'll do some more tests before i can make this official.
 
 --- Caffeinate The World [EMAIL PROTECTED] wrote:
  just a quick note, i changed all occurances of "size_t" in cache.c
  into
  "u_int32_t" and recompiled splitter. it seems as though it doesn't
  core dump on
  log files like before. note that i had to do "splitter -p" to get
 new
  files in
  ./splitter and then run "splitter". i've only been able to test
 this
  on a small
  set of logs. related to this, i also changed "size_t" in
 cachelogd.c
  to
  "unsigned int". for some reason if i changed it to "u_int32_t" my
  server ran at
  a very hight load.. usually it sits at around 1. but if i ran
  cachelogd with
  "u_int32_t" changes, it ran at over 30 for system load. scary.
  
  
  --- Caffeinate The World [EMAIL PROTECTED] wrote:
   NetBSD/Alpha (64bit). I reported this a while back for
 3.1.9pre13.
  Looks like
   it was not fixed for 3.1.9. I'm using cache mode.
   
   # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
   GNU gdb 4.17
   Copyright 1998 Free Software Foundation, Inc.
   GDB is free software, covered by the GNU General Public License,
  and you are
   welcome to change it and/or distribute copies of it under certain
  conditions.
   Type "show copying" to see the conditions.
   There is absolutely no warranty for GDB.  Type "show warranty"
 for
  details.
   This GDB was configured as "alpha--netbsd"...
   (gdb) run -f 77c -t 77c
   Starting program:
  /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c
   -t
   77c
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old: 
 
  0 new:   6
   total:   6
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old: 
 
  0 new:   2
   total:   2
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old: 
 
  0 new:   1
   total:   1
   /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old: 
 
  0
   new:36482
   total:36482
   
   Program received signal SIGSEGV, Segmentation fault.
   0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
   601
   table[header.ntables].wrd_id=logw
   ords[t-1].wrd_id;
   (gdb) bt
   #0  0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
   #1  0x120002ae0 in main (argc=1917, argv=0x1f8c0) at
  splitter.c:70
   warning: Hit heuristic-fence-post without finding
   warning: enclosing function for address 0x1400013550
   This warning occurs if you are debugging a function without any
  symbols
   (for example, in a stripped executable).  In that case, you may
  wish to
   increase the size of the search with the `set
 heuristic-fence-post'
  command.
   
   Otherwise, you told GDB there was a function where there isn't
 one,
  or
   (more likely) you have encountered a bug in GDB.
   (gdb) l
   596
  logwords[count].weight=0;
   597
   598
  for(t=1;tcount+1;t++){
   599
   if((logwords[t-1].wrd_id!=logwords[t].wrd
   _id)||
   600   
   (logwords[t-1].weight!=logwords[t].wei
   ght)){
   601
   table[header.ntables].wrd_id=logw
   ords[t-1].wrd_id;
   602
   table[header.ntables].weight=logw
   ords[t-1].weight;
   603
   table[header.ntables].pos=pos;
   604
   table[header.ntables].len=t*

Re: UdmSearch: splitter still core dumps on 3.1.9

2001-01-30 Thread Caffeinate The World

another interesting thing to note is that, from using the old log files
created by the "old" cachelogd (size_t instead of unsigned int), if
"splitter" core dumped, and i remove the file which caused it, i.e.

rm /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3

and redo "splitter -f 77c -t 77c" it won't core dump anymore. at this
point, i'm thinking that if the old cachelogd wrote records that were
wrong because of size_t (sizeof 8 instead of 4 on Alpha), and once that
"wrong" record is written to tree, the next batch of splitter process
need to load 77C3 to delete some records, that's where the problem
occurs. i'm going to restart everything again. this time, i won't use
the "old" log files from the cachelogd which had "size_t". i'll just
stick to the modified cachelogd (with unsigned int) and splitter with
cache.c using "u_int32_t".

--- Caffeinate The World [EMAIL PROTECTED] wrote:
 
 --- Caffeinate The World [EMAIL PROTECTED] wrote:
  that was a little premature on my part. it did core dump again at
 77C
  when i tried to split another log file. argh.
 
 it should be noted that i used log files from 3.1.9pre13. these log
 files were processed with cachelogd where i hadn't changed size_t to
 "unsigned int" yet. in which case it could have written the "wrong"
 record length or something.
 
 the very first batch i processed with "new" indexer (u_int32_t) were
 created with cachelogd (with size_t changed to unsigned int). that
 batch went fine. no core. then i processed the old 31 MB log from a
 cachelogd where it was still using size_t. this 31 mb log file also
 was
 written ok too. but when i processed another "older" log file, that's
 when it core again at 77C. it could be the older log files where
 cachelogd had size_t are causing problems.
 
  --- Caffeinate The World [EMAIL PROTECTED] wrote:
   overnight, the "new splitter" using "u_int32_t" was able to split
 a
   log
   file around 31MB. this is the first time i've seen it able to
 index
   the
   log at 77C. can you verify that linux and such have "u_int32_t"?
 if
   it's does, i'll submit my patch. this should fix the problem with
  the
   alpha. also the patch should enable NetBSD to compile cleanly
 cause
   we
   don't have native threads yet.
   
   i'll do some more tests before i can make this official.
   
   --- Caffeinate The World [EMAIL PROTECTED] wrote:
just a quick note, i changed all occurances of "size_t" in
  cache.c
into
"u_int32_t" and recompiled splitter. it seems as though it
  doesn't
core dump on
log files like before. note that i had to do "splitter -p" to
 get
   new
files in
./splitter and then run "splitter". i've only been able to test
   this
on a small
set of logs. related to this, i also changed "size_t" in
   cachelogd.c
to
"unsigned int". for some reason if i changed it to "u_int32_t"
 my
server ran at
a very hight load.. usually it sits at around 1. but if i ran
cachelogd with
"u_int32_t" changes, it ran at over 30 for system load. scary.


--- Caffeinate The World [EMAIL PROTECTED] wrote:
 NetBSD/Alpha (64bit). I reported this a while back for
   3.1.9pre13.
Looks like
 it was not fixed for 3.1.9. I'm using cache mode.
 
 # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
 GNU gdb 4.17
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public
  License,
and you are
 welcome to change it and/or distribute copies of it under
  certain
conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show
 warranty"
   for
details.
 This GDB was configured as "alpha--netbsd"...
 (gdb) run -f 77c -t 77c
 Starting program:
/usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c
 -t
 77c
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000
  old: 
   
0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000
  old: 
   
0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
  old: 
   
0 new:   6
 total:   6
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000
  old: 
   
0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000
  old: 
   
0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
  old: 
   
0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
  old: 
   
0 new:   2
 total:   2
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
  old: 
   
   

Re: UdmSearch: Webboard: indexer will not index Site

2001-01-30 Thread Caffeinate The World

how about showing us what your configuration look like, and how you are
running indexer (with what parameters etc)

--- Werner Bruns [EMAIL PROTECTED] wrote:
 Author: Werner Bruns
 Email: [EMAIL PROTECTED]
 Message:
 Hello there,
 regardless what I'm trying, the indexer is doing nothing. First I
 modified the indexer.conf (hopefully right), all what it did, it
 indexed the file "robots.txt" thats it. Second it used the minimal
 version of the indexer.conf. Inbetween I flushed the DB. Nothing!!!
 
 Database statistics:
 Expired 0
 Total   0
 
 I'm using mngosearch-3.1.9 and MySQL 3.22.32
 
 So, what I'm doing wrong??? 
 
 Reply: http://search.mnogo.ru/board/message.php?id=1192
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)

2001-01-30 Thread Caffeinate The World

alex or serge, could you look over this patch? i believe this patch
should fix this problem described below:

---cut---
# diff -ru indexer.c.orig indexer.c
--- indexer.c.orig  Tue Jan 30 10:45:03 2001
+++ indexer.c   Tue Jan 30 10:47:29 2001
@@ -368,7 +368,7 @@
}

/* Find correspondent Server record from indexer.conf */
-   if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){
+   if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr)) 
(!CurSrv-delete_no_server
))){
UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for
url... deleted.");
if(!strcmp(CurURL.filename,"robots.txt")){
   
if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo)))
---/cut---


--- Caffeinate The World [EMAIL PROTECTED] wrote:
 i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set
 with many
 URL's in my sql db not having associated Server commands. here i just
 tried to
 reindex and i see that my URL is being deleted:
 
 # indexer -m -s 200
 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with
 '/usr/local/install/mnogosearch-
 3.1.9/etc/indexer.conf'
 jobs
 Indexer[2397]: [1]
 http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm
 Indexer[2397]: [1] No 'Server' command for url... deleted.
 ò^C
 Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from
 deleting
 more URL's.
 
 here is my full indexer.conf:
 
 ---cut---
 #Include inc1.conf
 
 DBAddr  pgsql://***:*@/work/
 DBMode cache
 #SyslogFacility local7
 LogdAddr localhost:7000
 LocalCharset iso-8859-1
 Ispellmode db
 StopwordTable stopword
 
 #ServerTable server
 
 DeleteNoServer no
 
 #Allow *
 
 #Disallow NoMatch *.state.mn.us/*
 Disallow http://www.rootsweb.com/~mn*
 Disallow http://www.wxusa.com/*
 Disallow http://www.vitalrec.com/*
 Disallow http://*yahoo.com/*
 Disallow http://*aol.com/*
 Disallow http://www.salescircular.com/*
 Disallow http://*.wellsfargo.com/*
 # Disallow any except known extensions and directory index using
 "regex" match:
 Disallow NoMatch Regex

\/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a
 sp|\.txt$
 # Exclude cgi-bin and non-parsed-headers using "string" match:
 Disallow */cgi-bin/* *.cgi */nph-*
 # Exclude anything with '?' sign in URL. Note that '?' sign has a
 # special meaning in "string" match, so we have to use "regex" match
 here:
 #Disallow Regex  \?
 
 # Exclude some known extensions using fast "String" match:
 Disallow *.b*.sh   *.md5  *.rpm
 Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z *.bz2
 Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
 Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm
 *.pcx
 Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
 Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
 Disallow *.vrml *.wrl  *.png
 Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
 Disallow *.tex  *.texi *.xls  *.doc  *.texinfo
 Disallow *.rtf  *.pdf  *.cdf  *.ps
 Disallow *.ai   *.eps  *.ppt  *.hqx
 Disallow *.cpt  *.bms  *.oda  *.tcl
 Disallow *.o*.a*.la   *.so
 Disallow *.pat  *.pm   *.m4   *.am   *.css
 Disallow *.map  *.aif  *.sit  *.sea
 Disallow *.m3u  *.qt   *.mov
 
 # Exclude Apache directory list in different sort order using
 "string" match:
 Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
 
 # More complicated case. RAR .r00-.r99, ARJ a00-a99 files
 # and unix shared libraries. We use "Regex" match type here:
 Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
 
 #CheckOnly *.b*.sh   *.md5
 #CheckOnly *.arj  *.tar  *.zip  *.tgz  *.gz
 #CheckOnly *.lha  *.lzh  *.rar  *.zoo  *.tar*.Z
 #CheckOnly *.gif  *.jpg  *.jpeg *.bmp  *.tiff
 #CheckOnly *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie
 #CheckOnly *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff
 #CheckOnly *.vrml *.wrl  *.png
 #CheckOnly *.exe  *.cab  *.dll  *.bin  *.class
 #CheckOnly *.tex  *.texi *.xls  *.doc  *.texinfo
 #CheckOnly *.rtf  *.pdf  *.cdf  *.ps
 #CheckOnly *.ai   *.eps  *.ppt  *.hqx
 #CheckOnly *.cpt  *.bms  *.oda  *.tcl
 #CheckOnly *.rpm  *.m3u  *.qt   *.mov
 #CheckOnly *.map  *.aif  *.sit  *.sea
 #
 # or check ANY except known text extensions using "regex" match:
 #Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
 
 #HrefOnly */mail*.html */thread*.html
 
 UseRemoteContentType yes
 
 AddType text/plain  *.txt  *.pl *.js *.h *.c *.pm *.e
 AddType text/html   *.html *.htm *.m
 AddType image/x-xpixmap *.xpm
 AddType image/x-xbitmap *.xbm
 AddType image/gif   *.gif
 AddType Regex \.r[0-9][0-9]$
 AddType application/unknown *.*
 
 #Mime application/msword   "text/plain; charset=cp1251"   "catdoc
 $1"
 #Mime application/x-troff-man  text/plain
 "deroff"
 #Mime text/x-postscripttext/plain
 "ps2ascii"
 
 P

Re: UdmSearch: Webboard: Crash! Tainted prefix dirs

2001-01-30 Thread Caffeinate The World

what in particular crashes? what mode do you use? etc?

--- Mario Gray [EMAIL PROTECTED] wrote:
 Author: Mario Gray
 Email: [EMAIL PROTECTED]
 Message:
 Mnogo 3.1.9 still crashes very often, anyone have this experience as
 well?
 
 Reply: http://search.mnogo.ru/board/message.php?id=1195
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: How to index meta customizedtag=...

2001-01-30 Thread Caffeinate The World


--- Chen Zhang [EMAIL PROTECTED] wrote:
 Author: Chen Zhang
 Email: [EMAIL PROTECTED]
 Message:
 According to the udmsearch documentation, the indexer could grab
 contents in title, meta description, meta keyword,  body , url , url
 path ...
 
 But I have thouthands of files with the keywords in the format as 
 
 meta specialword=" 'name|chen' 'place|new_york'
 'telephone|212_9876374' "
 
 How to configure or change the source code to index  the keywords
 'name|chen' , 'place|new_york' and 'telephone|212_9876374' into the
 database?
 

meta Description="..."

 Any suggestions are highly appreciated.
 
 Chen Zhang
 
 Reply: http://search.mnogo.ru/board/message.php?id=1193
 
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)

2001-01-30 Thread Caffeinate The World

oops that didn't work. but i'm pretty sure we need to test for the
condition of delete_no_server here. i also tried:

  /* Find correspondent Server record from indexer.conf */
  if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){
if(Indexer-Conf-csrv-delete_no_server){
  UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url...
deleted.");
  if(!strcmp(CurURL.filename,"robots.txt")){
   
if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo)))
  result=UdmLoadRobots(Indexer);
  }else{
result=IND_OK;
  }
  if(result==IND_OK)result=UdmDeleteUrl(Indexer,Doc-url_id);
  FreeDoc(Doc);
  return(result);
}
  }

---/cut---

but that didn't work either. any ideas?


--- Caffeinate The World [EMAIL PROTECTED] wrote:
 alex or serge, could you look over this patch? i believe this patch
 should fix this problem described below:
 
 ---cut---
 # diff -ru indexer.c.orig indexer.c
 --- indexer.c.orig  Tue Jan 30 10:45:03 2001
 +++ indexer.c   Tue Jan 30 10:47:29 2001
 @@ -368,7 +368,7 @@
 }
 
 /* Find correspondent Server record from indexer.conf */
 -   if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){
 +   if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))
 
 (!CurSrv-delete_no_server
 ))){
 UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for
 url... deleted.");
 if(!strcmp(CurURL.filename,"robots.txt")){

 if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo)))
 ---/cut---
 
 
 --- Caffeinate The World [EMAIL PROTECTED] wrote:
  i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set
  with many
  URL's in my sql db not having associated Server commands. here i
 just
  tried to
  reindex and i see that my URL is being deleted:
  
  # indexer -m -s 200
  Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with
  '/usr/local/install/mnogosearch-
  3.1.9/etc/indexer.conf'
  jobs
  Indexer[2397]: [1]
  http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm
  Indexer[2397]: [1] No 'Server' command for url... deleted.
  ò^C
  Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from
  deleting
  more URL's.
  
  here is my full indexer.conf:
  
  ---cut---
  #Include inc1.conf
  
  DBAddr  pgsql://***:*@/work/
  DBMode cache
  #SyslogFacility local7
  LogdAddr localhost:7000
  LocalCharset iso-8859-1
  Ispellmode db
  StopwordTable stopword
  
  #ServerTable server
  
  DeleteNoServer no
  
  #Allow *
  
  #Disallow NoMatch *.state.mn.us/*
  Disallow http://www.rootsweb.com/~mn*
  Disallow http://www.wxusa.com/*
  Disallow http://www.vitalrec.com/*
  Disallow http://*yahoo.com/*
  Disallow http://*aol.com/*
  Disallow http://www.salescircular.com/*
  Disallow http://*.wellsfargo.com/*
  # Disallow any except known extensions and directory index using
  "regex" match:
  Disallow NoMatch Regex
 

\/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a
  sp|\.txt$
  # Exclude cgi-bin and non-parsed-headers using "string" match:
  Disallow */cgi-bin/* *.cgi */nph-*
  # Exclude anything with '?' sign in URL. Note that '?' sign has a
  # special meaning in "string" match, so we have to use "regex"
 match
  here:
  #Disallow Regex  \?
  
  # Exclude some known extensions using fast "String" match:
  Disallow *.b*.sh   *.md5  *.rpm
  Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z *.bz2
  Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
  Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm
  *.pcx
  Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
  Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
  Disallow *.vrml *.wrl  *.png
  Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
  Disallow *.tex  *.texi *.xls  *.doc  *.texinfo
  Disallow *.rtf  *.pdf  *.cdf  *.ps
  Disallow *.ai   *.eps  *.ppt  *.hqx
  Disallow *.cpt  *.bms  *.oda  *.tcl
  Disallow *.o*.a*.la   *.so
  Disallow *.pat  *.pm   *.m4   *.am   *.css
  Disallow *.map  *.aif  *.sit  *.sea
  Disallow *.m3u  *.qt   *.mov
  
  # Exclude Apache directory list in different sort order using
  "string" match:
  Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D
  
  # More complicated case. RAR .r00-.r99, ARJ a00-a99 files
  # and unix shared libraries. We use "Regex" match type here:
  Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
  
  #CheckOnly *.b*.sh   *.md5
  #CheckOnly *.arj  *.tar  *.zip  *.tgz  *.gz
  #CheckOnly *.lha  *.lzh  *.rar  *.zoo  *.tar*.Z
  #CheckOnly *.gif  *.jpg  *.jpeg *.bmp  *.tiff
  #CheckOnly *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie
  #CheckOnly *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff
  #CheckOnly *.vrml *.wrl  *.png
  #CheckOnly *.exe  *.cab  *.dll  *.bin  *.class
  #CheckOnly *.tex  *.texi *.xls  *.doc  *.texinfo
 

UdmSearch: Server order

2001-01-29 Thread Caffeinate The World

if indexer follows the order of Server command in the
indexer.conf file in order to index subsections before
parent sections:

Server http://host/depth1/depth2/
Server http://host/

how do you specify such order in ServerTable used in SQL?

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: splitter still core dumps on 3.1.9

2001-01-29 Thread Caffeinate The World

NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like
it was not fixed for 3.1.9. I'm using cache mode.

# gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "alpha--netbsd"...
(gdb) run -f 77c -t 77c
Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c -t
77c
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old:   0 new:   6
total:   6
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old:   0 new:   2
total:   2
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:   0 new:36482
total:36482

Program received signal SIGSEGV, Segmentation fault.
0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
601
table[header.ntables].wrd_id=logw
ords[t-1].wrd_id;
(gdb) bt
#0  0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
#1  0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address 0x1400013550
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
(gdb) l
596 logwords[count].weight=0;
597
598 for(t=1;tcount+1;t++){
599
if((logwords[t-1].wrd_id!=logwords[t].wrd
_id)||
600   
(logwords[t-1].weight!=logwords[t].wei
ght)){
601
table[header.ntables].wrd_id=logw
ords[t-1].wrd_id;
602
table[header.ntables].weight=logw
ords[t-1].weight;
603
table[header.ntables].pos=pos;
604
table[header.ntables].len=t*sizeo
f(UDM_CACHEWORD)-pos;
605
pos+=table[header.ntables].len;
(gdb)

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter still core dumps on 3.1.9

2001-01-29 Thread Caffeinate The World

just a quick note, i changed all occurances of "size_t" in cache.c into
"u_int32_t" and recompiled splitter. it seems as though it doesn't core dump on
log files like before. note that i had to do "splitter -p" to get new files in
./splitter and then run "splitter". i've only been able to test this on a small
set of logs. related to this, i also changed "size_t" in cachelogd.c to
"unsigned int". for some reason if i changed it to "u_int32_t" my server ran at
a very hight load.. usually it sits at around 1. but if i ran cachelogd with
"u_int32_t" changes, it ran at over 30 for system load. scary.


--- Caffeinate The World [EMAIL PROTECTED] wrote:
 NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like
 it was not fixed for 3.1.9. I'm using cache mode.
 
 # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
 GNU gdb 4.17
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "alpha--netbsd"...
 (gdb) run -f 77c -t 77c
 Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c
 -t
 77c
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old:   0 new:   6
 total:   6
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old:   0 new:   2
 total:   2
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:   0
 new:36482
 total:36482
 
 Program received signal SIGSEGV, Segmentation fault.
 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
 601
 table[header.ntables].wrd_id=logw
 ords[t-1].wrd_id;
 (gdb) bt
 #0  0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
 #1  0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70
 warning: Hit heuristic-fence-post without finding
 warning: enclosing function for address 0x1400013550
 This warning occurs if you are debugging a function without any symbols
 (for example, in a stripped executable).  In that case, you may wish to
 increase the size of the search with the `set heuristic-fence-post' command.
 
 Otherwise, you told GDB there was a function where there isn't one, or
 (more likely) you have encountered a bug in GDB.
 (gdb) l
 596 logwords[count].weight=0;
 597
 598 for(t=1;tcount+1;t++){
 599
 if((logwords[t-1].wrd_id!=logwords[t].wrd
 _id)||
 600   
 (logwords[t-1].weight!=logwords[t].wei
 ght)){
 601
 table[header.ntables].wrd_id=logw
 ords[t-1].wrd_id;
 602
 table[header.ntables].weight=logw
 ords[t-1].weight;
 603
 table[header.ntables].pos=pos;
 604
 table[header.ntables].len=t*sizeo
 f(UDM_CACHEWORD)-pos;
 605
 pos+=table[header.ntables].len;
 (gdb)
 
 __
 Get personalized email addresses from Yahoo! Mail - only $35 
 a year!  http://personal.mail.yahoo.com/
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: more splitter crashes

2001-01-26 Thread Caffeinate The World

i've been seeing splitter coredump consistently at
this point:

#
/usr/local/install/mnogosearch-3.1.9/sbin/splitter.old
-f 92e -t 92e
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E06000
old:   8 new:   1 total:   9
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E0D000
old:  19 new:   3 total:  22
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E12000
old:   7 new:   1 total:   8
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E36000
old:  72 new:   9 total:  81
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E48000
old:  63 new:   3 total:  66
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E5F000
old: 220 new:   1 total: 221
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E7E000
old:   1 new:   1 total:   2
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E97000
old:4044 new:  41 total:4085
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92EA3000
old:  74 new:   1 total:  75
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92EB
old: 192 new:   4 total: 196
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92EB1000
old:   5 new:   1 total:   6
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92ED5000
old: 248 new:   4 total: 252
Segmentation fault - core dumped
# gdb 
/usr/local/install/mnogosearch-3.1.9/sbin/splitter.old
splitter.old.core
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General
Public License, and you are
welcome to change it and/or distribute copies of it
under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show
warranty" for details.
This GDB was configured as "alpha--netbsd"...
Core was generated by `splitter.old'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/libexec/ld.elf_so...done.
Reading symbols from /usr/lib/libcrypt.so.0...done.
Reading symbols from /usr/local/lib/libpq.so.2...done.
Reading symbols from /usr/lib/libc.so.12...done.
#0  UdmSplitCacheLog (log=0) at cache.c:546
Source file is more recent than executable.
546
(gdb) bt
#0  UdmSplitCacheLog (log=0) at cache.c:546
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address
0x34d4a3a70fd62
This warning occurs if you are debugging a function
without any symbols
(for example, in a stripped executable).  In that
case, you may wish to
increase the size of the search with the `set
heuristic-fence-post' command.

Otherwise, you told GDB there was a function where
there isn't one, or
(more likely) you have encountered a bug in GDB.
(gdb) l
541   
 int j;
542
543   
 /*printf("Read old: %s\n",fname);*/
544   
 read(oldfd,header,sizeof(header));
545   
 read(oldfd,table,header.ntables*sizeof(UD
M_CACHETABLE));
546
547   
 for(w=0;wheader.ntables;w++){
548   
 int c=0;
549   
 int num=table[w].len/sizeof(UDM_C
ACHEWORD);
550   
 while((r=(num-c))0){
(gdb)

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: 'IspellMode db' and Postgres

2001-01-25 Thread Caffeinate The World

---cut---
DROP TABLE "affix";
DROP TABLE "spell";

CREATE TABLE "affix" (
  "flag" character varying(1) DEFAULT '' NOT NULL,
  "type" character varying(1) DEFAULT '' NOT NULL,
  "lang" character varying(3) DEFAULT '' NOT NULL,
  "mask" character varying(32) DEFAULT '' NOT NULL,
  "find" character varying(32) DEFAULT '' NOT NULL,
  "repl" character varying(32) DEFAULT '' NOT NULL
);

CREATE TABLE "spell" (
  "word" character varying(64) DEFAULT '' NOT NULL,
  "flag" character varying(32) DEFAULT '' NOT NULL,
  "lang" character varying(3) DEFAULT '' NOT NULL
);

CREATE  INDEX affix_flag ON affix (flag);
CREATE  INDEX spell_word ON spell (word);
---/cut---
--- Nick Wellnhofer [EMAIL PROTECTED] wrote:
 Author: Nick Wellnhofer
 Email: [EMAIL PROTECTED]
 Message:
 I tried to use the database ispell support
 ('IspellMode db') with Postgres, but i couldn't find
 a create/pgsql/ispell.txt file to create the
 database tables. I tried to modify the ispell.txt
 from the mysql directory, but no success. Does
 anybody know how to get 'IspellMode db' running on
 Postgres? I think it could give some speed
 improvement. 
 
 Nick
 
 
 Reply:
 http://search.mnogo.ru/board/message.php?id=1169
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: MP3 file causes Segmentation fault(core dumped)

2001-01-24 Thread Caffeinate The World

can you provide a backtrace from gdb? w/o it, it would
be hard to track down the problem.

--- Adrift [EMAIL PROTECTED] wrote:
 Author: Adrift
 Email: [EMAIL PROTECTED]
 Message:
 whenever I try to index a MP3 file I get a
 Segmentation fault(core dumped) message and the
 indexer quits... How do I fix this?
 Thanks,
 Ari
 
 
 Reply:
 http://search.mnogo.ru/board/message.php?id=1157
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Speed and Indexes...

2001-01-24 Thread Caffeinate The World

i had the same weird problem on pgsql, using crc-multi
mode. i switched to cache mode, now my queries are
under a second.

--- Matthew Sullivan [EMAIL PROTECTED] wrote:
 Hi All,
 
 Just a few thoughts to throw around - currently I am
 running the search to a MySQL backend which is a Sun
 Ultra 10 (Single UltraSpar 440M CPU) with 1 Gig Ram
 and 18Gig of drive space 
 
 If I login to the mysql server and connect to the
 database and perform a search on the word test -
 using the crc-multi indexed data and the sql
 command:
 
 select * from ndict4 where (word_id='-662733300');
 i get: 6844 rows in set (30.97 sec)
 and searching a 2nd time: 6844 rows in set (10.05
 sec)
 ndict4 contains 2051909 rows
 
 if I then search on: 'customer' [select * from
 ndict6 where (word_id='-175892837');]
 the result is: 2264 rows in set (7.51 sec)
 then: 2264 rows in set (3.15 sec)
 ndict6 contains 1415176 rows
 
 TO me this seems an awfully long time to perform
 searches (especially on 1 word) - the mysql server
 has been tuned roughly and currently consumes 400M
 of Physical RAM, and there are 95000ish documents in
 the database - consuming 933M of disk...
 
 Questions:
 1/ would it appear that I need to tune the MySQL
 server further?
 2/ are these search times extended or do they seem
 ok?
 3/ is there anyway of speeding the searches up?
 
 Using UltraSeek on the same words, the results are
 gathered and rendered in under 1 second (the logs
 report the queries takes 350ms)
 
 --
 Yours
 
 Matthew
 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 I'm really easy to get along with once you people
 learn to worship me.
 
  begin:vcard 
 n:Sullivan;Matthew
 tel;cell:+44 (0)780 122 5744
 tel;fax:+61 (0)3 9693 7699
 tel;home:Ex-Directory
 tel;work:+61 (0)3 9693 7640
 x-mozilla-html:TRUE
 url:http://people.netscape.com/matthews/
 org:TABLE cols=2 width=350 spacing=0 rows=1TRTD
 width=50img

src="http://people.netscape.com/matthews/penguin.gif"/TDTDTABLE
 width=250 spacing=0 border=0TRTDFONT
 SIZE=2Senior Technical Support
 EngineerTRTDFONT SIZE=2iPlanet E-Commerce
 SolutionsTRTDFONT SIZE=2Australian Technical
 Support Services/TABLE/TABLE
 version:2.1
 email;internet:[EMAIL PROTECTED]
 adr;quoted-printable:;;Netscape Communications
 Australia=0D=0A;Level 1, The Tea House, 28 Clarendon
 Street;South Melbourne;VIC 3205;Australia
 x-mozilla-cpt:nemesis.netscape.com;-27760
 fn:Matthew Sullivan
 end:vcard
 

 ATTACHMENT part 2 application/x-pkcs7-signature
name=smime.p7s



__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




FIX: Re: UdmSearch: php udm module (php-udm.0.1.tar.gz) returned no results

2001-01-21 Thread Caffeinate The World


--- Caffeinate The World [EMAIL PROTECTED]
wrote:
 mnogosearch 3.1.9pre13 (cache mode)
 http://www.izhcom.ru/~bar/php-udm.0.1.tar.gz
 NetBSD/DEC-Alpha 1.5.1
 
 i made this module with php4.0.4. compiled just
 fine.
 i change the db access in udmsearch.php and was able
 to access pgsql fine. but no matter what i change:
 
 // Stage 3: perform search
 
 $res=Udm_Find($udm,"lake");
 
 
 "lake" to, the search always returns:
 
 Documents 1-0 from 0 total found 
 
 i'm using words that i can find with the regular
 search.cgi (C version). is this module compatible
 with
 'cache' mode in 3.1.9pre13?

i guess there is a reason why it's v0.1. here is the
patch to fix it for any mode besides "single". i use
"cache" and it worked with both the php cgi and apache
module.

--- php_udm.c.orig  Sun Jan 21 02:23:57 2001
+++ php_udm.c   Sun Jan 21 02:02:38 2001
@@ -178,6 +178,7 @@
Env=UdmAllocEnv();
   
Agent=UdmAllocAgent(Env,0,0);
   
UdmEnvSetDBAddr(Env,dbaddr);
+  
UdmEnvSetDBMode(Env,dbmode);
   
ZEND_REGISTER_RESOURCE(return_value,Agent,le_link);
}
break;

yahoo will probably mess up on the line breaks, but i
hope you get the point.

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: anyone using 3.1.9pre13, CacheMode, and on Alpha?

2001-01-20 Thread Caffeinate The World

if you are using 3.1.9pre13, CacheMode, and on Alpha,
can you verify that 'splitter -p' creates only 4095
(000-FFE) instead of 4096 files (000-FFF) in
./var/splitter?

i can't seem to locate why this is in the function
'UdmPreSplitCacheLog()' from ./src/cache.c. my guess
is some kind of type mismatch, ie. size_t (on 64bit
Alpha it's 8 instead of 4 for sizeof) or
./include/udm_cache.h definitions structures using
time_t. again on the Alpha it's only 4 wide instead of
8 for sizeof.

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter core dump

2001-01-19 Thread Caffeinate The World

further testing shows that it's because size_t is
unsigned int on intel, but on the alpha it's unsigned
long. i'm on the alpha. it's understandable why we'd
overun the array buffer since i'd have on huge number
on the alpha.

--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Thanks for debugging! This will help us.
 
 
 Caffeinate The World wrote:
  
  it put in a printf to help track down the problem:
for(t=1;tcount+1;t++){
  /* Debug to test array out of bound */
  printf("Count:%4d, headerntables:%4d,
  Array Index t:%4d\n",count,header.ntables,t);
  
  if((logwords[t-1].wrd_id!=logwords[t].wrd_id)||
  
  (logwords[t-1].weight!=logwords[t].weight)){
  
  table[header.ntables].wrd_id=logwords[t-1].wrd_id;
  
  table[header.ntables].weight=logwords[t-1].weight;
table[header.ntables].pos=pos;
  
 

table[header.ntables].len=t*sizeof(UDM_CACHEWORD)-pos;
pos+=table[header.ntables].len;
header.ntables++;
  }
}
  
  after running splitter on the file 77C.log, i get:
  ...
  Count:35996, headerntables:8328, Array Index
 t:35571
  Count:35996, headerntables:8328, Array Index
 t:35572
  Count:35996, headerntables:8328, Array Index
 t:35573
  Count:35996, headerntables:8329, Array Index
 t:35574
  Segmentation fault - core dumped
  
  looks as if the array index is out of bound?
  
  --- Alexander Barkov [EMAIL PROTECTED] wrote:
   We are trying to discover this bug now.
  
  
   Caffeinate The World wrote:
   
mnogosearch 3.1.9-pre13, pgsql 7.1-current,
netbsd/alpha 1.5.1-current
   
running cachemode. i've been indexing and
   splitter-ing
just fine. 'til today when after an overnight
 of
indexers running and gathering up a log file 
 of
   over
31 MB, cachelogd automatically started a new
 log
   file.
   
i ran 'splitter -p' on that 31 MB log file. it
 was
split up just fine. then i ran 'splitter' and
 it
   core
dumped almost half way thru.
   
cut
...
Delete from cache-file
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000
Delete from cache-file
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
old:   2 new:   4 total:   6
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
old:   0 new:   1 total:   1
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
old:   0 new:   2 total:   2
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
old:   0 new:   1 total:   1
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000
old:   1 new:   1 total:   2
   
  
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3
old:27049 new:13718 total:40767
Segmentation fault - core dumped
/cut
   
here is the backtrace:
   
cut
...
#0  0x120018c44 in UdmSplitCacheLog (log=
Cannot access memory at address 0x121f873bc.
) at cache.c:591
591
 table[header.ntables].pos=pos;
(gdb) bt
#0  0x120018c44 in UdmSplitCacheLog (log=
Cannot access memory at address 0x121f873bc.
) at cache.c:591
warning: Hit heuristic-fence-post without
 finding
warning: enclosing function for address
0xc712f381000470e1
/cut
   
sorry i don't think i compiled splitter with
 debug
flag on so i don't have much more info.
   
here is the filesizes:
   
-rw-r--r--  1 root  wheel   4 Jan 14 10:56
   77A.log
-rw-r--r--  1 root  wheel   11732 Jan 14 10:56
   77B.log
-rw-r--r--  1 root  wheel  465360 Jan 14 10:56
   77C.log
   ^^
   ^^^
-rw-r--r--  1 root  wheel   73696 Jan 14 10:56
   77D.log
-rw-r--r--  1 root  wheel   22764 Jan 14 10:56
   77E.log
   
notice 77C.log, that's where it core dumped.
 it's
unusually large.
   
i think there is a bug in splitter. how do i
   continue
with the splitter process at this point so
 that
77C.log and others get processed?
   
   __
   If you want to unsubscribe send "unsubscribe
   udmsearch"
   to [EMAIL PROTECTED]
  
  
  __
  Do You Yahoo!?
  Get email at your own domain with Yahoo! Mail.
  http://personal.mail.yahoo.com/


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Re: splitter core dump @ 77C

2001-01-19 Thread Caffeinate The World

sizeof(int)=   4, sizeof(size_t)=   8


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Caffeinate The World wrote:
  
  did you look into this alex? 
 
   Yes, thank for report. We are trying to find the
 reason of bug.
 
 
  if not, i'll recompile
  with debug on for splitter and will try to locate
 the
  problem myself. i think it has to do with the
 64bit
  platform and wrong expected numbers.
 
   What is  sizeof(int) on your platform?


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: no FFF tree in cachemode tree structure

2001-01-19 Thread Caffeinate The World

yes that's correct i'm on NetBSD/Dec-Alpha 64bit. you
guys need to look over the use of size_t. see my other
email message. 

on the Alpha it's unsigned long, on intel it's
unsigned int. this would cause the big difference.



--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Caffeinate The World wrote:
  
  from cachemode.txt
  
   /var/tree/00/0/0
   ...
   /var/tree/00/0/000FF
   ...
   ...
   /var/tree/FF/F/FFF00
   ...
   /var/tree/FF/F/F
  
  in 3.1.9pre13, i've never seen splitter break the
 tree
  into /var/tree/FF/F/ only /var/tree/FF/E/...
 is
  the highest. is that a bug?
 
 Probably this is becaues of Tru64? We'll check the
 code
 agains platform independance.
 
 
  also the filename is actually 8 hex chars instead
 of
  just 5.
 
 It is changed in 3.1.9 sources. Files now are 8
 characters in length.


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




time_t is int on Alpha (was Re: UdmSearch: no FFF tree in cachemode tree structure)

2001-01-19 Thread Caffeinate The World

so i covered the problems with size_t in my other
email. while trying to figure out why im missing FFF
tree node, i saw in include/udm_cache.h that there are
many references to time_t.

you should know that on 64bit Alpha, time_t is an int
where as on the other platforms, time_t is a long.

so with size_t in cache.c and time_t in
include/udm_cache.h, i can see why i'm having these
problems on my 64bit Alpha.
 
--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Caffeinate The World wrote:
  
  from cachemode.txt
  
   /var/tree/00/0/0
   ...
   /var/tree/00/0/000FF
   ...
   ...
   /var/tree/FF/F/FFF00
   ...
   /var/tree/FF/F/F
  
  in 3.1.9pre13, i've never seen splitter break the
 tree
  into /var/tree/FF/F/ only /var/tree/FF/E/...
 is
  the highest. is that a bug?
 
 Probably this is becaues of Tru64? We'll check the
 code
 agains platform independance.
 
 
  also the filename is actually 8 hex chars instead
 of
  just 5.
 
 It is changed in 3.1.9 sources. Files now are 8
 characters in length.


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




RE: UdmSearch: Webboard: Search never finds any records...

2001-01-18 Thread Caffeinate The World

ok i'm going to assume you're very new. i mean no
offense by this, but mnogosearch is really confusing
to begin with. 

in the 'etc' dir where you have your indexer.conf
file, you should have another file 'search.htm'. i'm
going to assume you are using search.cgi instead of
the perl or php version.

in search.htm, you need to make sure that 

  DBAddr and DBMode

match those set in indexer.conf. if you don't have
those set correctly you won't get any results back
from your search. that was my problem. i didn't set
DBMode in search.htm to match indexer.conf.


--- John Dispirito [EMAIL PROTECTED] wrote:
 Could you be more specific?  I've used the basic
 settings
 in the the stock conf file and it still doesn't
 work,  but
 when i get a status of the indexer,  its indexed
 like 2200 sites 
 completely, but still no results..
 
 
 -Original Message-
 From: Caffeinate The World
 [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, January 17, 2001 7:54 PM
 To: John Dispirito; [EMAIL PROTECTED]
 Subject: Re: UdmSearch: Webboard: Search never finds
 any records...
 
 
 check your settings in search.htm. i had the exact
 same problem when i first started using mnogosearch.
 
 --- John Dispirito [EMAIL PROTECTED] wrote:
  Author: John Dispirito
  Email: [EMAIL PROTECTED]
  Message:
  I have a problem,  I'm running UDMsearch 3.0.23, 
  and it successfully
  spiders all of my sites (about 150) but whenever I
  try to
  search for anything, it never finds any
 information,
  no matter
  how simple the search query...
  
  My search.conf file is default except for the
  changes to the dbaddr
  line and the crc-multi line. 
  
  my indexer.conf file is here, I've omitted the
 urls
  I'm searching, but
  they were in the format Server http://www.url.org
  
  Any ideas?
  
  
  =-=-=-=-indexer.conf file=-=-=-=-=-=-
  
  #
  # This is indexer.conf sample for 'ftpsearch'
 mode.
  # Indexer will index only the URL but no the
 content
  # of the documents.
  #
  
  DBHost  localhost
  DBName  udmsearch
  DBUser  root
  
  
  # Turn on indexing URL of the documents
  UrlWeight   1
  
  # Do not process robots.txt. It is usually used on
  HTTP servers only
  Robots no
  
  URL INFO WENT HERE
  
  
  # Retrieve only directory list, use HEAD for other
  files.
  CheckOnly [^/]$
  
  # Exclude Apache and Squid directory lists in
  different sort order
  Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$
  \?N=D$ \?S=A$ \?S=D$
  # Exclude ./. and ./.. from directory list
  Disallow /[.]{1,2} /\%2e /\%2f
  
  
  
  
  Reply:
  http://search.mnogo.ru/board/message.php?id=1141
  
  __
  If you want to unsubscribe send "unsubscribe
  udmsearch"
  to [EMAIL PROTECTED]
  
 
 
 __
 Do You Yahoo!?
 Get email at your own domain with Yahoo! Mail. 
 http://personal.mail.yahoo.com/


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter core dump

2001-01-18 Thread Caffeinate The World

it put in a printf to help track down the problem:
  for(t=1;tcount+1;t++){
/* Debug to test array out of bound */
printf("Count:%4d, headerntables:%4d,
Array Index t:%4d\n",count,header.ntables,t);
   
if((logwords[t-1].wrd_id!=logwords[t].wrd_id)||
  
(logwords[t-1].weight!=logwords[t].weight)){
 
table[header.ntables].wrd_id=logwords[t-1].wrd_id;
 
table[header.ntables].weight=logwords[t-1].weight;
  table[header.ntables].pos=pos;
 
table[header.ntables].len=t*sizeof(UDM_CACHEWORD)-pos;
  pos+=table[header.ntables].len;
  header.ntables++;
}
  }

after running splitter on the file 77C.log, i get:
...
Count:35996, headerntables:8328, Array Index t:35571
Count:35996, headerntables:8328, Array Index t:35572
Count:35996, headerntables:8328, Array Index t:35573
Count:35996, headerntables:8329, Array Index t:35574
Segmentation fault - core dumped

looks as if the array index is out of bound?

--- Alexander Barkov [EMAIL PROTECTED] wrote:
 We are trying to discover this bug now.
 
 
 Caffeinate The World wrote:
  
  mnogosearch 3.1.9-pre13, pgsql 7.1-current,
  netbsd/alpha 1.5.1-current
  
  running cachemode. i've been indexing and
 splitter-ing
  just fine. 'til today when after an overnight of
  indexers running and gathering up a log file  of
 over
  31 MB, cachelogd automatically started a new log
 file.
  
  i ran 'splitter -p' on that 31 MB log file. it was
  split up just fine. then i ran 'splitter' and it
 core
  dumped almost half way thru.
  
  cut
  ...
  Delete from cache-file
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000
  Delete from cache-file
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
  old:   2 new:   4 total:   6
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
  old:   0 new:   1 total:   1
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
  old:   0 new:   2 total:   2
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
  old:   0 new:   1 total:   1
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000
  old:   1 new:   1 total:   2
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3
  old:27049 new:13718 total:40767
  Segmentation fault - core dumped
  /cut
  
  here is the backtrace:
  
  cut
  ...
  #0  0x120018c44 in UdmSplitCacheLog (log=
  Cannot access memory at address 0x121f873bc.
  ) at cache.c:591
  591
   table[header.ntables].pos=pos;
  (gdb) bt
  #0  0x120018c44 in UdmSplitCacheLog (log=
  Cannot access memory at address 0x121f873bc.
  ) at cache.c:591
  warning: Hit heuristic-fence-post without finding
  warning: enclosing function for address
  0xc712f381000470e1
  /cut
  
  sorry i don't think i compiled splitter with debug
  flag on so i don't have much more info.
  
  here is the filesizes:
  
  -rw-r--r--  1 root  wheel   4 Jan 14 10:56
 77A.log
  -rw-r--r--  1 root  wheel   11732 Jan 14 10:56
 77B.log
  -rw-r--r--  1 root  wheel  465360 Jan 14 10:56
 77C.log
 ^^ 
 ^^^
  -rw-r--r--  1 root  wheel   73696 Jan 14 10:56
 77D.log
  -rw-r--r--  1 root  wheel   22764 Jan 14 10:56
 77E.log
  
  notice 77C.log, that's where it core dumped. it's
  unusually large.
  
  i think there is a bug in splitter. how do i
 continue
  with the splitter process at this point so that
  77C.log and others get processed?
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Search never finds any records...

2001-01-17 Thread Caffeinate The World

check your settings in search.htm. i had the exact
same problem when i first started using mnogosearch.

--- John Dispirito [EMAIL PROTECTED] wrote:
 Author: John Dispirito
 Email: [EMAIL PROTECTED]
 Message:
 I have a problem,  I'm running UDMsearch 3.0.23, 
 and it successfully
 spiders all of my sites (about 150) but whenever I
 try to
 search for anything, it never finds any information,
 no matter
 how simple the search query...
 
 My search.conf file is default except for the
 changes to the dbaddr
 line and the crc-multi line. 
 
 my indexer.conf file is here, I've omitted the urls
 I'm searching, but
 they were in the format Server http://www.url.org
 
 Any ideas?
 
 
 =-=-=-=-indexer.conf file=-=-=-=-=-=-
 
 #
 # This is indexer.conf sample for 'ftpsearch' mode.
 # Indexer will index only the URL but no the content
 # of the documents.
 #
 
 DBHost  localhost
 DBName  udmsearch
 DBUser  root
 
 
 # Turn on indexing URL of the documents
 UrlWeight   1
 
 # Do not process robots.txt. It is usually used on
 HTTP servers only
 Robots no
 
 URL INFO WENT HERE
 
 
 # Retrieve only directory list, use HEAD for other
 files.
 CheckOnly [^/]$
 
 # Exclude Apache and Squid directory lists in
 different sort order
 Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$
 \?N=D$ \?S=A$ \?S=D$
 # Exclude ./. and ./.. from directory list
 Disallow /[.]{1,2} /\%2e /\%2f
 
 
 
 
 Reply:
 http://search.mnogo.ru/board/message.php?id=1141
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: how do i index mall of america (it uses servlet)

2001-01-17 Thread Caffeinate The World

this site uses some weird servlet and i can't index
it. error i get from indexer is: no content-type in
...

http://www.mallofamerica.com/
http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html

i didn't disallow the '?' in indexer.conf and i've
added 'servlet' to:

Disallow NoMatch Regex 
\/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.asp$|servlet|\.txt$

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: how do i index mall of america (it uses servlet)

2001-01-17 Thread Caffeinate The World

here is more info:

Indexer[9843]: [1]
http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=main;
rs=0file=General/general.html
Indexer[9843]: [1] Realm string 'http://*'
Indexer[9843]: [1] Allow by default
Indexer[9843]: [1] HTTP/1.1 200 ok
Indexer[9843]: [1] Server: Microsoft-IIS/4.0
Indexer[9843]: [1] Date: Thu, 18 Jan 2001 04:23:27 GMT
Indexer[9843]: [1] content-type:text/html
Indexer[9843]: [1] Set-Cookie:LangID=0;Expires=Sat,
18-Jan-2003 04:23:27 GMT;Path=/
Indexer[9843]: [1]
Cache-Control:no-cache="set-cookie,set-cookie2"
Indexer[9843]: [1] Expires:Thu, 01 Dec 1994 16:00:00
GMT
Indexer[9843]: [1] HTTP/1.1 200 ok ? 15968
Indexer[9843]: [1] No Content-type in
'http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369p
n=STATICframe=mainrs=0file=General/general.html'!

as you can see it shows:

Indexer[9843]: [1] content-type:text/html

but yet complains that it didn't have Content-type. i
know mnogo comparison is not case-sensitive. so why
the error? is it because of the lack of space after
the colon?

--- Caffeinate The World [EMAIL PROTECTED]
wrote:
 this site uses some weird servlet and i can't index
 it. error i get from indexer is: no content-type in
 ...
 
 http://www.mallofamerica.com/

http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html
 
 i didn't disallow the '?' in indexer.conf and i've
 added 'servlet' to:
 
 Disallow NoMatch Regex

\/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.asp$|servlet|\.txt$
 
 __
 Do You Yahoo!?
 Get email at your own domain with Yahoo! Mail. 
 http://personal.mail.yahoo.com/
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




patch for indexer to fix Content-Type [was Re: UdmSearch: how do i index mall of america (it uses servlet)]

2001-01-17 Thread Caffeinate The World

here is a patch for 3.1.9-pre13 to fix the cases where
some web servers don't follow specs and not have the
space between the ':' and the kind of Content-Type.
ie. it uses 

Content-Type:text/html 
^^
instead of 

Content-Type: text/html

--- indexer.c.orig  Thu Jan 18 02:06:08 2001
+++ indexer.c   Thu Jan 18 01:44:07 2001
@@ -802,7 +802,8 @@
!UDM_STRNCASECMP(sname,"IIS"))
   
Indexer-charset=UDM_CHARSET_CP1251;
}else
-   if(!UDM_STRNCASECMP(tok,"Content-Type: ")){
+   if(!UDM_STRNCASECMP(tok,"Content-Type: ")||
+  
!UDM_STRNCASECMP(tok,"Content-Type:")){
if
(!Indexer-Conf-use_remote_cont_type) {
   
content_type=UdmContentType(Indexer-Conf,Doc-url);
    }
--- Caffeinate The World [EMAIL PROTECTED]
wrote:
 here is more info:
 
 Indexer[9843]: [1]

http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=main;
 rs=0file=General/general.html
 Indexer[9843]: [1] Realm string 'http://*'
 Indexer[9843]: [1] Allow by default
 Indexer[9843]: [1] HTTP/1.1 200 ok
 Indexer[9843]: [1] Server: Microsoft-IIS/4.0
 Indexer[9843]: [1] Date: Thu, 18 Jan 2001 04:23:27
 GMT
 Indexer[9843]: [1] content-type:text/html
 Indexer[9843]: [1] Set-Cookie:LangID=0;Expires=Sat,
 18-Jan-2003 04:23:27 GMT;Path=/
 Indexer[9843]: [1]
 Cache-Control:no-cache="set-cookie,set-cookie2"
 Indexer[9843]: [1] Expires:Thu, 01 Dec 1994 16:00:00
 GMT
 Indexer[9843]: [1] HTTP/1.1 200 ok ? 15968
 Indexer[9843]: [1] No Content-type in

'http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369p
 n=STATICframe=mainrs=0file=General/general.html'!
 
 as you can see it shows:
 
 Indexer[9843]: [1] content-type:text/html
 
 but yet complains that it didn't have Content-type.
 i
 know mnogo comparison is not case-sensitive. so why
 the error? is it because of the lack of space after
 the colon?
 
 --- Caffeinate The World [EMAIL PROTECTED]
 wrote:
  this site uses some weird servlet and i can't
 index
  it. error i get from indexer is: no content-type
 in
  ...
  
  http://www.mallofamerica.com/
 

http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html
  
  i didn't disallow the '?' in indexer.conf and i've
  added 'servlet' to:
  
  Disallow NoMatch Regex
 

\/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.asp$|servlet|\.txt$
  
  __
  Do You Yahoo!?
  Get email at your own domain with Yahoo! Mail. 
  http://personal.mail.yahoo.com/
  __
  If you want to unsubscribe send "unsubscribe
  udmsearch"
  to [EMAIL PROTECTED]
  
 
 
 __
 Do You Yahoo!?
 Get email at your own domain with Yahoo! Mail. 
 http://personal.mail.yahoo.com/


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: incorrect Follow behavior

2001-01-16 Thread Caffeinate The World

using v3.1.9pre13. i have in short:

DeleteNoServer no
Follow path
Realm *

with no Server variable set for this URL. why does

indexer -i -u
http://www.gorp.com/gorp/location/mn/mn.htm

add other paths from www.gorp.com? let me illustrate:

# indexer -C -u http://www.gorp.com/%
You are going to delete database 'mnwork' content
Are you sure?(YES/no)YES
Deleting...Done
# indexer -i -u
http://www.gorp.com/gorp/location/mn/mn.htm
Indexer[7327]: indexer from
mnogosearch-3.1.9.pre13/PgSQL started with
'/usr/local/install/mnogos
earch-3.1.9/etc/indexer.conf'
Indexer[7327]: [1]
http://www.gorp.com/gorp/location/mn/mn.htm
Indexer[7327]: [1] Done (9 seconds)
---/cut---

mnwork=# select url from url where url like
'http://www.gorp.com/%';
url
---
 http://www.gorp.com/
 http://www.gorp.com/default.htm
 http://www.gorp.com/gorp/about.htm
 http://www.gorp.com/gorp/activity/byway/MN.htm
 http://www.gorp.com/gorp/activity/main.htm

http://www.gorp.com/gorp/activity/paddling/wsr_mwgl.htm
 http://www.gorp.com/gorp/books/main.htm

http://www.gorp.com/gorp/eclectic/family/minn_family.htm
 http://www.gorp.com/gorp/freelance/
 http://www.gorp.com/gorp/gear/main.htm
 http://www.gorp.com/gorp/guide.htm
 http://www.gorp.com/gorp/interact/default.htm
 http://www.gorp.com/gorp/jobs/
 http://www.gorp.com/gorp/jobs/gorpjobs.htm
 http://www.gorp.com/gorp/location/MN/MN.htm
 http://www.gorp.com/gorp/location/MN/MN_e.htm
 http://www.gorp.com/gorp/location/MN/MN_feats.htm
 http://www.gorp.com/gorp/location/MN/MN_links.htm
 http://www.gorp.com/gorp/location/MN/MN_maps.htm
 http://www.gorp.com/gorp/location/MN/MN_ne.htm
 http://www.gorp.com/gorp/location/MN/MN_nw.htm
 http://www.gorp.com/gorp/location/MN/MN_resource.htm
 http://www.gorp.com/gorp/location/MN/MN_se.htm
 http://www.gorp.com/gorp/location/MN/MN_sw.htm
 http://www.gorp.com/gorp/location/MN/MN_w.htm
 http://www.gorp.com/gorp/location/cities/main.htm

http://www.gorp.com/gorp/location/cities/minneapolis.htm
 http://www.gorp.com/gorp/location/main.htm
 http://www.gorp.com/gorp/location/mn/
 http://www.gorp.com/gorp/location/mn/mn.htm

http://www.gorp.com/gorp/location/mn/we_twincities.htm
 http://www.gorp.com/gorp/location/mn/xc_gun.htm
 http://www.gorp.com/gorp/location/us/us.htm
more URLs

if 'Follow path' is set, shouldn't it default to that?
shouldn't it ONLY add URLs like:

http://www.gorp.com/gorp/location/mn/*

which would fall in the same path?

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: no FFF tree in cachemode tree structure

2001-01-16 Thread Caffeinate The World

from cachemode.txt

 /var/tree/00/0/0
 ...
 /var/tree/00/0/000FF
 ...
 ...
 /var/tree/FF/F/FFF00
 ...
 /var/tree/FF/F/F

in 3.1.9pre13, i've never seen splitter break the tree
into /var/tree/FF/F/ only /var/tree/FF/E/... is
the highest. is that a bug?

also the filename is actually 8 hex chars instead of
just 5.

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter core dump

2001-01-16 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 We are trying to discover this bug now.

i've not had 8 files that made splitter core dump. all
of them at 'var/tree/77/C/77C3'

i hope that was more detail for you. there is a
pattern there.
 
 Caffeinate The World wrote:
  
  mnogosearch 3.1.9-pre13, pgsql 7.1-current,
  netbsd/alpha 1.5.1-current
  
  running cachemode. i've been indexing and
 splitter-ing
  just fine. 'til today when after an overnight of
  indexers running and gathering up a log file  of
 over
  31 MB, cachelogd automatically started a new log
 file.
  
  i ran 'splitter -p' on that 31 MB log file. it was
  split up just fine. then i ran 'splitter' and it
 core
  dumped almost half way thru.
  
  cut
  ...
  Delete from cache-file
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000
  Delete from cache-file
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
  old:   2 new:   4 total:   6
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
  old:   0 new:   1 total:   1
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
  old:   0 new:   2 total:   2
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
  old:   0 new:   1 total:   1
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000
  old:   1 new:   1 total:   2
 

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3
  old:27049 new:13718 total:40767
  Segmentation fault - core dumped
  /cut
  
  here is the backtrace:
  
  cut
  ...
  #0  0x120018c44 in UdmSplitCacheLog (log=
  Cannot access memory at address 0x121f873bc.
  ) at cache.c:591
  591
   table[header.ntables].pos=pos;
  (gdb) bt
  #0  0x120018c44 in UdmSplitCacheLog (log=
  Cannot access memory at address 0x121f873bc.
  ) at cache.c:591
  warning: Hit heuristic-fence-post without finding
  warning: enclosing function for address
  0xc712f381000470e1
  /cut
  
  sorry i don't think i compiled splitter with debug
  flag on so i don't have much more info.
  
  here is the filesizes:
  
  -rw-r--r--  1 root  wheel   4 Jan 14 10:56
 77A.log
  -rw-r--r--  1 root  wheel   11732 Jan 14 10:56
 77B.log
  -rw-r--r--  1 root  wheel  465360 Jan 14 10:56
 77C.log
 ^^ 
 ^^^
  -rw-r--r--  1 root  wheel   73696 Jan 14 10:56
 77D.log
  -rw-r--r--  1 root  wheel   22764 Jan 14 10:56
 77E.log
  
  notice 77C.log, that's where it core dumped. it's
  unusually large.
  
  i think there is a bug in splitter. how do i
 continue
  with the splitter process at this point so that
  77C.log and others get processed?
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: getting URLs from sub tree of dmoz without indexing dmoz

2001-01-15 Thread Caffeinate The World

using 3.1.9pre13:

i've been able to figure out most situations and how
to index sites. now i'm stuck at trying to get the
site URLs listed in dmoz, and then only index those
urls found.

DBAddr  pgsql://user:pass@/mydb/
DBMode cache
LogdAddr localhost:7000
Ispellmode db
StopwordTable stopword
DeleteNoServer no

HrefOnly Match String *dmoz*
Disallow String http://www.dmoz.org/*

#Allow *
Disallow NoMatch Regex
\/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.txt$
Disallow */cgi-bin/* *.cgi */nph-*
Disallow Regex  \?

Index yes
Follow path

Server world
http://www.dmoz.org/Regional/North_America/United_States/California/
Realm http://*
---/cut---

that will not work. it will insert the server url
above into the db and that's it. won't even traverse
the subtree and grab all the site urls. i don't want
to index anything at dmoz, just get the urls listed
for each sub categories, then index those sites.

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: getting URLs from sub tree of dmoz without indexing dmoz

2001-01-15 Thread Caffeinate The World


--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
 Tuesday, January 16, 2001, 8:19:06 AM, you wrote:
 
 CTW HrefOnly Match String *dmoz*
 CTW Disallow String http://www.dmoz.org/*
 
 CTW that will not work. it will insert the server
 url
 CTW above into the db and that's it. won't even
 traverse
 CTW the subtree and grab all the site urls.
 
 It will not work because you disallowed everything
 under http://www.dmoz.org/

that was just one combination i tried. here is another
that will not work under 3.1.9pre13:

DBAddr  pgsql://user:pass@/mnwork/
DBMode cache
LogdAddr localhost:7000
Ispellmode db
StopwordTable stopword
DeleteNoServer no

HrefOnly Match Regex .*dmoz.*Minnesota.*
Allow
http://www.dmoz.org/Regional/North_America/United_States/Minnesota/*
Disallow http://www.dmoz.org/*

#Allow *
Disallow NoMatch Regex
\/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.php3$|\.txt$
Disallow */cgi-bin/* *.cgi */nph-*
Disallow Regex  \?

Index yes
Follow path

Server world
http://www.dmoz.org/Regional/North_America/United_States/Minnesota/Weather
Realm http://*

---\cut---

# indexer -u %dmoz%
Indexer[4800]: indexer from
mnogosearch-3.1.9.pre13/PgSQL started with
'/usr/local/install/mnogos
earch-3.1.9/etc/indexer.conf'
Tue 16 00:49:24 [29262] Client #0 connected
Indexer[4800]: [1]
http://www.dmoz.org/Regional/North_America/United_States/Minnesota/Weather
Indexer[4800]: [1] http://www.dmoz.org/robots.txt
Indexer[4800]: [1] Done (19 seconds)
Tue 16 00:49:43 [29262] Client #0 left

---\cut---

why didn't it pickup the URLs of the site listed under
the Weather categorie?

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: splitter core dump

2001-01-14 Thread Caffeinate The World

mnogosearch 3.1.9-pre13, pgsql 7.1-current,
netbsd/alpha 1.5.1-current

running cachemode. i've been indexing and splitter-ing
just fine. 'til today when after an overnight of
indexers running and gathering up a log file  of over
31 MB, cachelogd automatically started a new log file.

i ran 'splitter -p' on that 31 MB log file. it was
split up just fine. then i ran 'splitter' and it core
dumped almost half way thru.

cut
...
Delete from cache-file
/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000
Delete from cache-file
/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
old:   2 new:   4 total:   6
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
old:   0 new:   1 total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
old:   0 new:   2 total:   2
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
old:   0 new:   1 total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000
old:   1 new:   1 total:   2
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3
old:27049 new:13718 total:40767
Segmentation fault - core dumped
/cut

here is the backtrace:

cut
...
#0  0x120018c44 in UdmSplitCacheLog (log=
Cannot access memory at address 0x121f873bc.
) at cache.c:591
591   
 table[header.ntables].pos=pos;
(gdb) bt
#0  0x120018c44 in UdmSplitCacheLog (log=
Cannot access memory at address 0x121f873bc.
) at cache.c:591
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address
0xc712f381000470e1
/cut

sorry i don't think i compiled splitter with debug
flag on so i don't have much more info.

here is the filesizes:

-rw-r--r--  1 root  wheel   4 Jan 14 10:56 77A.log
-rw-r--r--  1 root  wheel   11732 Jan 14 10:56 77B.log
-rw-r--r--  1 root  wheel  465360 Jan 14 10:56 77C.log
   ^^  ^^^
-rw-r--r--  1 root  wheel   73696 Jan 14 10:56 77D.log
-rw-r--r--  1 root  wheel   22764 Jan 14 10:56 77E.log

notice 77C.log, that's where it core dumped. it's
unusually large.

i think there is a bug in splitter. how do i continue
with the splitter process at this point so that
77C.log and others get processed?



__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter core dump

2001-01-14 Thread Caffeinate The World


--- Caffeinate The World [EMAIL PROTECTED]
wrote:
 mnogosearch 3.1.9-pre13, pgsql 7.1-current,
 netbsd/alpha 1.5.1-current
 
 running cachemode. i've been indexing and
 splitter-ing
 just fine. 'til today when after an overnight of
 indexers running and gathering up a log file  of
 over
 31 MB, cachelogd automatically started a new log
 file.
 
 i ran 'splitter -p' on that 31 MB log file. it was
 split up just fine. then i ran 'splitter' and it
 core
 dumped almost half way thru.
 
 cut
 ...
 Delete from cache-file

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE1000
 Delete from cache-file

/usr/local/install/mnogosearch-3.1.9/var/tree/77/B/77BE2000

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000
 old:   2 new:   4 total:   6

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000
 old:   0 new:   1 total:   1

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000
 old:   0 new:   2 total:   2

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000
 old:   0 new:   1 total:   1

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000
 old:   1 new:   1 total:   2

/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3
 old:27049 new:13718 total:40767
 Segmentation fault - core dumped
 /cut
 
 here is the backtrace:
 
 cut
 ...
 #0  0x120018c44 in UdmSplitCacheLog (log=
 Cannot access memory at address 0x121f873bc.
 ) at cache.c:591
 591 
  
  table[header.ntables].pos=pos;
 (gdb) bt
 #0  0x120018c44 in UdmSplitCacheLog (log=
 Cannot access memory at address 0x121f873bc.
 ) at cache.c:591
 warning: Hit heuristic-fence-post without finding
 warning: enclosing function for address
 0xc712f381000470e1
 /cut
 
 sorry i don't think i compiled splitter with debug
 flag on so i don't have much more info.
 
 here is the filesizes:
 
 -rw-r--r--  1 root  wheel   4 Jan 14 10:56
 77A.log
 -rw-r--r--  1 root  wheel   11732 Jan 14 10:56
 77B.log
 -rw-r--r--  1 root  wheel  465360 Jan 14 10:56
 77C.log
^^ 
 ^^^
 -rw-r--r--  1 root  wheel   73696 Jan 14 10:56
 77D.log
 -rw-r--r--  1 root  wheel   22764 Jan 14 10:56
 77E.log
 
 notice 77C.log, that's where it core dumped. it's
 unusually large.
 
 i think there is a bug in splitter. how do i
 continue
 with the splitter process at this point so that
 77C.log and others get processed?

what i ended up doing here was moved 77C.log to a
backup location and ran "splitter" again. it continued
and processed all the rest of the file just fine. now,
how do i get 77C.log processed? also, should i have
backed up the 'del.log' file too? i noticed when
splitter did it's work, there were files deleted as well.

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: search.cgi cachemode and php

2001-01-14 Thread Caffeinate The World

a while back someone mentioned piping results from
search.cgi to a php script to display results and
provide for interactions of php form to search.cgi. i
can't seem to find this in the mailing list or web
board using search. 

i have no idea when the php people will add in the udm
module so i can use php to display and do my queries.
the problem is our site is template based, changed by
visitor's preferences, so it would help to have this function.

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Re: http://www.dma.state.mn.us/

2001-01-12 Thread Caffeinate The World

it stalls on 3.1.8 with default time outs. i just
installed 3.1.9pre and will try that out. on 3.1.8 by
reducing the timeouts, indexer will not stall.

--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
 Hi!
 
  http://www.pca.state.mn.us/water/basins/mnriver/
  Indexer[22838]: [1] http://www.dma.state.mn.us/
 
  
  The arrow shows where it hangs. Here is what 'ps'
   
 I tried this with mnogoSearch-3.1.9.pre12. It
 correctly says that it
 cannot connect to this site port 80.
 
 -- 
 Regards, Sergey aka gluke.
 
 


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: 3.1.9pre13 splitter -p doesn't remove log files

2001-01-12 Thread Caffeinate The World

the manual says:
   B. Preparing cachelogd logs for creating word
indexes:

Run splitter with "-p" command line argument:

/usr/local/mnogosearch/sbin/splitter -p

  This operation takes all available logs in
/var/raw/ directory,
  devides logs into 4096 parts (one file for each
low level word index
  directory) and store data acceptable by splitter
in /var/splitter/
  directory. All processed logs in /raw/raw/
directory are removed
  automatically after this operation.

i ran 'splitter -p' and the 'splitter' but the logs
are still there. bug?

# ls -la /data/mn*/var/raw
total 1154
drwxr-xr-x  2 root  wheel 512 Jan 12 16:05 .
drwxr-xr-x  6 root  wheel 512 Jan 12 15:16 ..
-rw-r--r--  1 root  wheel3336 Jan 12 16:05
979337154.del
-rw-r--r--  1 root  wheel  552836 Jan 12 16:05
979337154.wrd
-rw-r--r--  1 root  wheel3648 Jan 12 16:45 del.log
-rw-r--r--  1 root  wheel  591672 Jan 12 16:45 wrd.log

i know the files in ./splitter/* have to be removed
manually though.

__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Indexer dies at seemingly random intervals

2001-01-11 Thread Caffeinate The World


--- Alexander Barkov [EMAIL PROTECTED] wrote:
 Does indexer hung on local web space or remote
 servers which are 
 far from indexer's machine?

indexing all from remote machines. see this message:

http://search.mnogo.ru/board/message.php?id=1024

in summary, i think there are bugs in checking for
network timeouts or timing out on docs that can't be
retrieved.


 
 
 
 mocha wrote:
  
  Author: mocha
  Email: [EMAIL PROTECTED]
  Message:
  i\'ve seen this too. i would start like 10 indexer
 processes and let them run overnight. then when i
 wake up in the morning, i see most are in a stale
 idle state. the associated postgres process is also
 in idle.
  
  i\'m not sure if it\'s relevant, but it started
 happening after i reached about 20,000 web pages
 indexed.
  
  here you can see that 3 of the indexers are just
 sitting idle:
  
  22838 p1  I0:33.82 indexer
  18309 p3  Is   0:00.05 -sh
  18312 p3  S0:00.21 sh
  22945 p3  I0:11.62 indexer -l
  22948 p3  I0:14.57 indexer -l
  22951 p3  S0:33.17 indexer -l
  22953 p3  S0:24.93 indexer -l
  
  just last night after indexing for a long time, in
 the morning i found all 10 indexers stalling. i sent
 a \'kill -HUP\' to the idling indexer processes, and
 executed indexer again. then they started indexing.
 it\'s been a few hours and a few are starting to go
 stale or idle again.
  
  i\'m on NetBSD/Alpha 1.5.1_ALPHA. again, i didn\'t
 see this behavior \'til around 20,000 URLs indexed.
  
  # indexer -S
  
UdmSearch statistics
  
  StatusExpired  Total
 -
   0   8835   9921 Not indexed yet
 200  0  23068 OK
 300  0  1 Multiple Choices
 301  0 13 Moved Permanently
 302  0 32 Moved Temporarily
 401  0  1 Unauthorized
 403  0 15 Forbidden
 404  0215 Not found
 500  0  1 Internal Server
 Error
 503  0  6 Service
 Unavailable
 -
   Total   8835  33273
  
  i just checked before sending this message, and
 ALL four indexers are stale again:
  
  22838 p1  I0:33.82 indexer
  18309 p3  Is   0:00.05 -sh
  18312 p3  S0:00.21 sh
  22945 p3  I0:11.62 indexer -l
  22948 p3  I0:14.57 indexer -l
  22951 p3  I0:36.71 indexer -l
  22953 p3  I0:25.60 indexer -l
 
 


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: neil@integrals.co.nz

2001-01-09 Thread Caffeinate The World

they could have kept with one search engine and pool
their efforts together. 

--- Anonymous [EMAIL PROTECTED] wrote:
 Author: Neil Fincham
 Email: 
 Message:
 Has anyone seen ASPseek?
 
 It\'s avalible at
 http://www.sw.com.sg/products/aspseek/ it look\'s
 very simular to mnogosearch.  Actualy it is
 mnogosearch the thanks file reads:- 
 
 \"We would like to thank developers of UdmSearch
 (now known as MnogoSearch)
 search engine and especially Alexander Barkov who
 started that project
 for the ideas and source code which we used in
 ASPSeek.\"
 
 It is under the GPL license and they have a few
 advancement\'s that are quite good (phrase search
 being one of them).  Perhaps we should port a few of
 the good bits back :-).
 
 
 Reply:
 http://search.mnogo.ru/board/message.php?id=1044
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: anyone using (super fast) cache mode indexing?

2001-01-09 Thread Caffeinate The World

i'm currently using pgsql and mnogosearch 3.1.8 and
searching takes forever -- especially when searching
for multiple words. i looked into the new cache
indexing mode and it seems fantastically fast. you can
try it out at:

http://udm.aspseek.com/cgi-bin/search.cgi

i would like to know if anyone is using it in
production? how do you like it? what about the manual
maintenance steps you have to take? is there anyway to
turn the current data that's already indexed in sql
into cache mode, or is reindexing the only way to do
it?

i'm just amazed how fast the results are.

__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: neil@integrals.co.nz

2001-01-09 Thread Caffeinate The World

yes you did. you dl-ed it and compiled it. then ran
it. and even searched it. well your efforts aren't in
vain though. you saved me some troubles ;-)


--- Neil [EMAIL PROTECTED] wrote:
  they could have kept with one search engine and
 pool
  their efforts together.
 
 I agree, I have just compiled it, it is radically
 different in a lot of
 way's.  quite easy to crash thou, all I have to do
 is search for more than
 one word at the same time :-) (perhaps I'm doing
 something wrong).
 
 Neil
 
 
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: indexing urls in the server table

2001-01-08 Thread Caffeinate The World


--- "L.T. Harris" [EMAIL PROTECTED] wrote:
 Author: L.T. Harris
 Email: [EMAIL PROTECTED]
 Message:
 Thanks, but I\'m more confused than ever now.
 
 What is the perpose of the sever table that is
 created by the create file server.txt?  

well if you can get the data into that db table you
can use the command:

ServerTable your_table_name_1 your_table_name_2

in your indexer.conf file to get that information. but
back to your original question. if you are going to
user the server table then you'll need to specify the
follow command as i stated in the last message.

set 'follow' to:
page - if you want to index only that particular page
site - index the whole site


 I have quite a large number of URL I don\'t want to
 have to put each one in the indexer.conf file (That
 is what you\'re saying I have to do or am just I
 being dumb).  
 
 Thanks!
 
 
 Reply:
 http://search.mnogo.ru/board/message.php?id=1022
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Indexer hang after 1 to 2 hours

2001-01-08 Thread Caffeinate The World

i'm having the same problem. i emailed the list and
posted on the message board but still no response.
i've read about others having the same problem. so you
aren't alone here.

--- Ernesto Vargas [EMAIL PROTECTED] wrote:
 I am having problems running the indexer for more
 that 1 or 2 hours. It just
 hang with any error. I was running 5 indexer at the
 same time but it happend
 the same with 1 indexer at the time.
 
 Any suggestions on improving indexer response time?
 
 
 

_
 Do You Yahoo!?
 Get your free @yahoo.com address at
 http://mail.yahoo.com
 
 __
 If you want to unsubscribe send "unsubscribe
 udmsearch"
 to [EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: indexer hangs idle on some webpages

2001-01-06 Thread Caffeinate The World

i'm indexing my state government's web sites. however,
there are some sites that indexer just stalls on. See
the last line below:
...
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-30.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-31.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-32.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-33.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-34.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-a.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-b.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-c.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/appd-d-i.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-j.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-k.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-l.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-m.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-n.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/apndx-o.htm
Indexer[22838]: [1]
http://www.pca.state.mn.us/water/basins/mnriver/plancomment.html
Indexer[22838]: [1]
http://www.pca.state.mn.us/water/basins/mnriver/mgmt-fw.html
Indexer[22838]: [1]
http://www.pca.state.mn.us/water/basins/mnriver/mnorgs.html
Indexer[22838]: [1]
http://www.pca.state.mn.us/water/basins/mnriver/watersheds.html
Indexer[22838]: [1]
http://www.pca.state.mn.us/water/basins/mnriver/publications.html
Indexer[22838]: [1]
http://www.pca.state.mn.us/water/basins/mnriver/
Indexer[22838]: [1] http://www.dma.state.mn.us/ 

The arrow shows where it hangs. Here is what 'ps'
shows:

22838 p1  I0:33.82 indexer

it's been idling for over 30 minutes. which in turn
cause the associated pgsql process to idle too. 

what could cause this?

__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: multiple simultaneous indexers

2001-01-03 Thread Caffeinate The World

Am I understanding this correctly? If I don't compile
with pthreads, I can't run multiple indexers at once?
Right now on NetBSD/Alpha, we don't have native
threads, would it be possible to change mnogosearch to
support other userland thread package like gnu-pth,
PTL2, mit-pthreads?

__
Do You Yahoo!?
Yahoo! Photos - Share your holiday photos online!
http://photos.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]