when indexing (version 3.1.10), any URL's with a space (%20) will cause
the error:
Too many network errors for this server, skipped
but the URL does load fine in a browser.
...
Indexer[21663]: [1]
http://www.co.dakota.mn.us/socialservices/chcare/COMPLAINTS.htm
Indexer[21663]: [1]
is there.
would it make any diffrence if the URLs where loaded into the db using
indexer -i -f urls.txt first. then i changed the indexer.conf to have
the mirror settings?
Caffeinate The World wrote:
--- Alexander Barkov [EMAIL PROTECTED] wrote:
That's strange for me. I've just checked this config
--- Zenon Panoussis [EMAIL PROTECTED] wrote:
Caffeinate The World skrev:
i have indexer going but i see nothing in the mirror directories.
when
does it store the pages to the mirror directory?
If your pages are already indexed, when you re-index with -a
indexer will check
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Caffeinate The World wrote:
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Hello!
We finally found a bug in cache.c. New version is in attachement.
Everybody who has problems with splitter's crashes are welcome to
test
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Hello!
We finally found a bug in cache.c. New version is in attachement.
Everybody who has problems with splitter's crashes are welcome to
test.
should the 'tree' directory be removed? can we split the raw log files
we have thus far or is
i didn't get this error on my NetBSD/Alpha. compile was fine.
what system are you on?
--- Zenon Panoussis [EMAIL PROTECTED] wrote:
Alexander Barkov skrev:
We finally found a bug in cache.c. New version is in attachement.
Everybody who has problems with splitter's crashes are
i'm trying to store all web pages locally so i don't have to go fetch
them on the internet each time i re-index.
i have indexer going but i see nothing in the mirror directories. when
does it store the pages to the mirror directory?
# grep Mirror indexer.conf
MirrorRoot
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Alexander Barkov wrote:
i completely forgot about this feature!!! i read about it when i
first
started using mnogosearch, but never bothered to use it.
with mirror feature, wouldn't it be easy to implement Google's
"cache"
indexer to follow and index anything which is not what i want. what i'm
looking for is some parameter like DeleteNoServer but for mirroring.
where it would mirror all URLs already in the db or fed to it by an
external list.
Caffeinate The World wrote:
Mirrors command must be used BEFORE Server
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Caffeinate The World wrote:
The only one disadvantage is that it will not work on huge
search engines with millions documents. There is a limit on total
file number on file system in most unixes.
For example, my 30G /usr partition
*.mn.us/* being indexed, but still nothing in the
mirror directories. this is very odd.
Caffeinate The World wrote:
Mirrors command must be used BEFORE Server commands, they are
per-server
command, so you can use different mirror location for different
sites
--- Adrift [EMAIL PROTECTED] wrote:
Author: Adrift
Email: [EMAIL PROTECTED]
Message:
every version of mysql I have installed worked perfectly, that is the
install ran smoothly (I am using FreeBSD 3.4). When I tried to "MAKE"
the new version of mnogosearch, 3.1.10, I got the error:
--- Anonymous [EMAIL PROTECTED] wrote:
Author: pokistu
Email:
Message:
Actually I am running indexer, but it is 'eating' a lot of system
memory. I want to know if i can stop with the 'term' signal (linux)
the indexer program, and the DATABASE will NOT corrupt.
i do it and i've not
is the table 'dict' used at all in cache mode? mine doesn't have any records.
__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/
__
If you want to unsubscribe send
i've been going through this and back again time and time again. what
would really be nice is indexer save the logs in a format that's easy
to use again. for instance, you can use the format re-index to sql etc.
or if you want to reindex again, you don't have to crawl through all
the external
in my tests your 3 little files wouldn't make a difference. he would
have to run splitter -p and splitter on all the files starting from the
first original RAW file, including all the 31 MB file. i believe in my
case it was the original 31mb file which caused the problem.
while processing the
you can try http://aspseek.com i think that's the other one based on
mnogo. or was it aspsearch.com? argh i forgot. there is also htdig.
check em out. as far as mnogo.. no one is getting paid for development
here. people spend their time coding and releasing it for free. yes i
agree, the docs
i reported this problems a while back. i believe it's being worked on.
atleast the recently found the bug why it wasn't splitting out to FFF.
the seg fault happens during the splitter process and not index. i've
been splitter when the logs are at about 2 MB and i've not had
splitter core dump on
, i'm indexing but running splitter when the files
are around 2MB.
--- Zenon Panoussis [EMAIL PROTECTED] wrote:
Caffeinate The World skrev:
I run splitter -p and finish fine. I then run splitter and,
halfway through the splitting, crash: segmentation fault, or
just a hang, core
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
I have a question. I have a servertable of about 20,000 urls, I
was wandering if that was what my performance bottleneck is. It
seems that Indexer takes all the cpu time on my
:
Caffeinate The World skrev:
i'll wait. for now, i'm indexing but running splitter when the
files
are around 2MB.
I've been running indexer -c 3600 since last night, producing
log files of 5-10 MB and running splitter every time afterwards,
with cleaning of var/splitter and all. So
what machine are you on? Alpha? OS?
i had the same problem and i sent a message to the mailing list
describing how i corrected it. search for "core" and "splitter"
can you check another thing? i've never seen my splitter split the
lasta file "FFF.log". do you get that file? it goes as high as
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
Friday, February 02, 2001, 8:57:28 AM, you wrote:
CTW i modified libtool a bit and it compiled and apache didn't
complain.
CTW i'll try making a sharedlib later. but upon testing it. it's
VERY FAST.
I have a question: have you
http://search.freewinds.cx/cgi-bin/search2.cgi
--- Alexander Barkov [EMAIL PROTECTED] wrote:
What "New search" do you mean guys?
I can't find it on this page.
Caffeinate The World wrote:
oops ignore my last post, i forgot to use New Search. yes you are
right. wow. yikes
when using:
--enable-shared
all client programs of mnogosearch looks for their library in ".libs"
instead of "$PREFIX/lib"
# ./search.cgi
Cannot open ".libs/libudmsearch.so"
# indexer -h
Cannot open ".libs/libudmsearch.so"
__
Get personalized
y capitalization will not work. i do have "IspellMode db"
and "StopwordTable stopword" set. seems like some problems with suffix
and ispell mode. i'm using 3.1.9.
Caffeinate The World wrote:
i modified libtool a bit and it compiled and apache didn't
complain.
i
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
Friday, February 02, 2001, 6:20:39 PM, you wrote:
CTW this doesn't yet support ispell suffix or prefix mode does it?
No, it will be done soon.
CTW maybe this is why searches fail on pluralized words. also,
search will
CTW fail on
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
Friday, February 02, 2001, 7:23:58 PM, you wrote:
CTW maybe this is why searches fail on pluralized words. also,
search will
CTW fail on any words with one or more letters capitalized.
This is strange. Have you setup
t it
climb to 30mb like before. i'll do that soon here, but the indexing
process is extremely slow (4 indexers running, not threaded). maybe
it's because of the 1/2 million expired urls in pgsql's db.
Caffeinate The World wrote:
hi alex,
could you let me know if you found anything and if yo
was the problem which caused a segmentation fault fixed?
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
Here is the php4 extension module which adds native libudmsearch
functions support for php. We uploaded it at the PHP CVS source
tree, so it is expected that this module will
i took out "-ludmsearch" from LIBS. recompiled:
...
gmake[1]: Entering directory
`/home/staffs/t/tom/work/php/php4-current/php4'
/bin/sh /home/staffs/t/tom/work/php/php4-current/php4/libtool --silent
--mode=compile gcc -I. -I
/home/staffs/t/tom/work/php/php4-current/php4/
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
CTW was the problem which caused a segmentation fault fixed?
The problem was the mysql library bundled with php.
If you compile php with --with-mysql it uses its own library to
access
mysql. And if you compile it with --with-mysql=DIR,
i modified libtool a bit and it compiled and apache didn't complain.
i'll try making a sharedlib later. but upon testing it. it's VERY FAST.
i'm using cache mode and it's a few folds faster than the CGI version.
also my db is pgsql. when i say fast, i mean REALLY REALLY FAST. and
this is with a
it's been bugging me for sometime now.. what does mnogo stand for? or what does
it mean?
__
Get personalized email addresses from Yahoo! Mail - only $35
a year! http://personal.mail.yahoo.com/
__
If you want to unsubscribe send
that was a little premature on my part. it did core dump again at 77C
when i tried to split another log file. argh.
--- Caffeinate The World [EMAIL PROTECTED] wrote:
overnight, the "new splitter" using "u_int32_t" was able to split a
log
file around 31MB. this is the
ess
need to load 77C3 to delete some records, that's where the problem
occurs. i'm going to restart everything again. this time, i won't use
the "old" log files from the cachelogd which had "size_t". i'll just
stick to the modified cachelogd (with unsigned int) and splitter with
how about showing us what your configuration look like, and how you are
running indexer (with what parameters etc)
--- Werner Bruns [EMAIL PROTECTED] wrote:
Author: Werner Bruns
Email: [EMAIL PROTECTED]
Message:
Hello there,
regardless what I'm trying, the indexer is doing nothing. First I
leted.");
if(!strcmp(CurURL.filename,"robots.txt")){
if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo)))
---/cut---
--- Caffeinate The World [EMAIL PROTECTED] wrote:
i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' s
what in particular crashes? what mode do you use? etc?
--- Mario Gray [EMAIL PROTECTED] wrote:
Author: Mario Gray
Email: [EMAIL PROTECTED]
Message:
Mnogo 3.1.9 still crashes very often, anyone have this experience as
well?
Reply: http://search.mnogo.ru/board/message.php?id=1195
--- Chen Zhang [EMAIL PROTECTED] wrote:
Author: Chen Zhang
Email: [EMAIL PROTECTED]
Message:
According to the udmsearch documentation, the indexer could grab
contents in title, meta description, meta keyword, body , url , url
path ...
But I have thouthands of files with the keywords in
if(result==IND_OK)result=UdmDeleteUrl(Indexer,Doc-url_id);
FreeDoc(Doc);
return(result);
}
}
---/cut---
but that didn't work either. any ideas?
--- Caffeinate The World [EMAIL PROTECTED] wrote:
alex or serge, could you look over this patch? i believe this patch
should fix th
if indexer follows the order of Server command in the
indexer.conf file in order to index subsections before
parent sections:
Server http://host/depth1/depth2/
Server http://host/
how do you specify such order in ServerTable used in SQL?
__
Do
NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like
it was not fixed for 3.1.9. I'm using cache mode.
# gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General
cachelogd with
"u_int32_t" changes, it ran at over 30 for system load. scary.
--- Caffeinate The World [EMAIL PROTECTED] wrote:
NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like
it was not fixed for 3.1.9. I'm using cache mode.
# gdb /usr/local/ins
i've been seeing splitter coredump consistently at
this point:
#
/usr/local/install/mnogosearch-3.1.9/sbin/splitter.old
-f 92e -t 92e
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E06000
old: 8 new: 1 total: 9
/usr/local/install/mnogosearch-3.1.9/var/tree/92/E/92E0D000
old: 19 new:
---cut---
DROP TABLE "affix";
DROP TABLE "spell";
CREATE TABLE "affix" (
"flag" character varying(1) DEFAULT '' NOT NULL,
"type" character varying(1) DEFAULT '' NOT NULL,
"lang" character varying(3) DEFAULT '' NOT NULL,
"mask" character varying(32) DEFAULT '' NOT NULL,
"find" character
can you provide a backtrace from gdb? w/o it, it would
be hard to track down the problem.
--- Adrift [EMAIL PROTECTED] wrote:
Author: Adrift
Email: [EMAIL PROTECTED]
Message:
whenever I try to index a MP3 file I get a
Segmentation fault(core dumped) message and the
indexer quits... How do
i had the same weird problem on pgsql, using crc-multi
mode. i switched to cache mode, now my queries are
under a second.
--- Matthew Sullivan [EMAIL PROTECTED] wrote:
Hi All,
Just a few thoughts to throw around - currently I am
running the search to a MySQL backend which is a Sun
Ultra 10
--- Caffeinate The World [EMAIL PROTECTED]
wrote:
mnogosearch 3.1.9pre13 (cache mode)
http://www.izhcom.ru/~bar/php-udm.0.1.tar.gz
NetBSD/DEC-Alpha 1.5.1
i made this module with php4.0.4. compiled just
fine.
i change the db access in udmsearch.php and was able
to access pgsql fine
if you are using 3.1.9pre13, CacheMode, and on Alpha,
can you verify that 'splitter -p' creates only 4095
(000-FFE) instead of 4096 files (000-FFF) in
./var/splitter?
i can't seem to locate why this is in the function
'UdmPreSplitCacheLog()' from ./src/cache.c. my guess
is some kind of type
! This will help us.
Caffeinate The World wrote:
it put in a printf to help track down the problem:
for(t=1;tcount+1;t++){
/* Debug to test array out of bound */
printf("Count:%4d, headerntables:%4d,
Array Index t:%4d\n",count,header
sizeof(int)= 4, sizeof(size_t)= 8
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Caffeinate The World wrote:
did you look into this alex?
Yes, thank for report. We are trying to find the
reason of bug.
if not, i'll recompile
with debug on for splitter and will try
yes that's correct i'm on NetBSD/Dec-Alpha 64bit. you
guys need to look over the use of size_t. see my other
email message.
on the Alpha it's unsigned long, on intel it's
unsigned int. this would cause the big difference.
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Caffeinate The World
with size_t in cache.c and time_t in
include/udm_cache.h, i can see why i'm having these
problems on my 64bit Alpha.
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Caffeinate The World wrote:
from cachemode.txt
/var/tree/00/0/0
...
/var/tree/00/0/000FF
PROTECTED] wrote:
Could you be more specific? I've used the basic
settings
in the the stock conf file and it still doesn't
work, but
when i get a status of the indexer, its indexed
like 2200 sites
completely, but still no results..
-Original Message-
From: Caffeinate The World
ables:8328, Array Index t:35573
Count:35996, headerntables:8329, Array Index t:35574
Segmentation fault - core dumped
looks as if the array index is out of bound?
--- Alexander Barkov [EMAIL PROTECTED] wrote:
We are trying to discover this bug now.
Caffeinate The World wrote:
mnogosearch 3
check your settings in search.htm. i had the exact
same problem when i first started using mnogosearch.
--- John Dispirito [EMAIL PROTECTED] wrote:
Author: John Dispirito
Email: [EMAIL PROTECTED]
Message:
I have a problem, I'm running UDMsearch 3.0.23,
and it successfully
spiders all of
this site uses some weird servlet and i can't index
it. error i get from indexer is: no content-type in
...
http://www.mallofamerica.com/
http://www.mallofamerica.com/moa/servlet/SMTMall?mid=369pn=STATICframe=mainrs=0file=General/general.html
i didn't disallow the '?' in indexer.conf and i've
complains that it didn't have Content-type. i
know mnogo comparison is not case-sensitive. so why
the error? is it because of the lack of space after
the colon?
--- Caffeinate The World [EMAIL PROTECTED]
wrote:
this site uses some weird servlet and i can't index
it. error i get from indexer is: no co
MP(tok,"Content-Type: ")||
+
!UDM_STRNCASECMP(tok,"Content-Type:")){
if
(!Indexer-Conf-use_remote_cont_type) {
content_type=UdmContentType(Indexer-Conf,Doc-url);
}
--- Caffeinate The World [EMAIL PROTECTED]
wr
using v3.1.9pre13. i have in short:
DeleteNoServer no
Follow path
Realm *
with no Server variable set for this URL. why does
indexer -i -u
http://www.gorp.com/gorp/location/mn/mn.htm
add other paths from www.gorp.com? let me illustrate:
# indexer -C -u http://www.gorp.com/%
You are going to
from cachemode.txt
/var/tree/00/0/0
...
/var/tree/00/0/000FF
...
...
/var/tree/FF/F/FFF00
...
/var/tree/FF/F/F
in 3.1.9pre13, i've never seen splitter break the tree
into /var/tree/FF/F/ only /var/tree/FF/E/...
--- Alexander Barkov [EMAIL PROTECTED] wrote:
We are trying to discover this bug now.
i've not had 8 files that made splitter core dump. all
of them at 'var/tree/77/C/77C3'
i hope that was more detail for you. there is a
pattern there.
Caffeinate The World wrote:
mnogosearch 3.1.9
using 3.1.9pre13:
i've been able to figure out most situations and how
to index sites. now i'm stuck at trying to get the
site URLs listed in dmoz, and then only index those
urls found.
DBAddr pgsql://user:pass@/mydb/
DBMode cache
LogdAddr localhost:7000
Ispellmode db
StopwordTable
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
Tuesday, January 16, 2001, 8:19:06 AM, you wrote:
CTW HrefOnly Match String *dmoz*
CTW Disallow String http://www.dmoz.org/*
CTW that will not work. it will insert the server
url
CTW above into the db and that's it. won't even
mnogosearch 3.1.9-pre13, pgsql 7.1-current,
netbsd/alpha 1.5.1-current
running cachemode. i've been indexing and splitter-ing
just fine. 'til today when after an overnight of
indexers running and gathering up a log file of over
31 MB, cachelogd automatically started a new log file.
i ran
--- Caffeinate The World [EMAIL PROTECTED]
wrote:
mnogosearch 3.1.9-pre13, pgsql 7.1-current,
netbsd/alpha 1.5.1-current
running cachemode. i've been indexing and
splitter-ing
just fine. 'til today when after an overnight of
indexers running and gathering up a log file of
over
31 MB
a while back someone mentioned piping results from
search.cgi to a php script to display results and
provide for interactions of php form to search.cgi. i
can't seem to find this in the mailing list or web
board using search.
i have no idea when the php people will add in the udm
module so i
it stalls on 3.1.8 with default time outs. i just
installed 3.1.9pre and will try that out. on 3.1.8 by
reducing the timeouts, indexer will not stall.
--- Sergey Kartashoff [EMAIL PROTECTED] wrote:
Hi!
http://www.pca.state.mn.us/water/basins/mnriver/
Indexer[22838]: [1]
the manual says:
B. Preparing cachelogd logs for creating word
indexes:
Run splitter with "-p" command line argument:
/usr/local/mnogosearch/sbin/splitter -p
This operation takes all available logs in
/var/raw/ directory,
devides logs into 4096 parts (one file
--- Alexander Barkov [EMAIL PROTECTED] wrote:
Does indexer hung on local web space or remote
servers which are
far from indexer's machine?
indexing all from remote machines. see this message:
http://search.mnogo.ru/board/message.php?id=1024
in summary, i think there are bugs in checking
they could have kept with one search engine and pool
their efforts together.
--- Anonymous [EMAIL PROTECTED] wrote:
Author: Neil Fincham
Email:
Message:
Has anyone seen ASPseek?
It\'s avalible at
http://www.sw.com.sg/products/aspseek/ it look\'s
very simular to mnogosearch. Actualy
i'm currently using pgsql and mnogosearch 3.1.8 and
searching takes forever -- especially when searching
for multiple words. i looked into the new cache
indexing mode and it seems fantastically fast. you can
try it out at:
http://udm.aspseek.com/cgi-bin/search.cgi
i would like to know if anyone
yes you did. you dl-ed it and compiled it. then ran
it. and even searched it. well your efforts aren't in
vain though. you saved me some troubles ;-)
--- Neil [EMAIL PROTECTED] wrote:
they could have kept with one search engine and
pool
their efforts together.
I agree, I have just
--- "L.T. Harris" [EMAIL PROTECTED] wrote:
Author: L.T. Harris
Email: [EMAIL PROTECTED]
Message:
Thanks, but I\'m more confused than ever now.
What is the perpose of the sever table that is
created by the create file server.txt?
well if you can get the data into that db table you
can
i'm having the same problem. i emailed the list and
posted on the message board but still no response.
i've read about others having the same problem. so you
aren't alone here.
--- Ernesto Vargas [EMAIL PROTECTED] wrote:
I am having problems running the indexer for more
that 1 or 2 hours. It
i'm indexing my state government's web sites. however,
there are some sites that indexer just stalls on. See
the last line below:
...
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-30.htm
Indexer[22838]: [1]
http://www.doer.state.mn.us/lr-mlea/artcl-31.htm
Indexer[22838]: [1]
Am I understanding this correctly? If I don't compile
with pthreads, I can't run multiple indexers at once?
Right now on NetBSD/Alpha, we don't have native
threads, would it be possible to change mnogosearch to
support other userland thread package like gnu-pth,
PTL2, mit-pthreads?
78 matches
Mail list logo