Re: UdmSearch: Trouble with Realm commmand

2001-01-29 Thread Alexander Barkov

Thomas Yengst wrote:
 
 I am trying to do something very simple, but cannot seem to make it
 work. I want to include Powerpoint files (*.ppt) into the url list, but
 instead index an accompanying text file (*.txt), which is simply the
 version of the Powerpoint file that has been saved as text only. I would
 think the following indexer.conf commands would work to index all the
 powerpoint files in http://localhost/test/.
 
 Realm http://localhost/(test/*)\.ppt file:/www/$1\.txt
 Disallow *.txt
 
 The disallow is to prevent the text files from being indexed as well. I
 want the search results to point at the Powerpoint file, not the text
 file. I've tried every variation on the commands that I can think of,
 but I still can't get the words in the text files into the database. I
 even tried to invoke an external parser just to cat the text file, but
 that didn't work either.


Don't forget  to add one or several start URLs using either "URL"
indexer.conf
command or indexer with -i -u keys.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Exact search node suggestion

2001-01-29 Thread

Hi.

Ispell  fuzzy  search  mode  is nice, but please add an ability for exact
search  too  (look  like Yandex does, i.e. "!word" gives exact word while
"word"  is fuzzy). To implement this, all you need is:

"db" mode: store exact word in database too, in addition to its Ispell form.
"text" mode: use or not use Ispell files according to "!" syntax.

-- 
Andrey A. Chernov
http://ache.pp.ru/


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Performance: cache db

2001-01-29 Thread Alexander Barkov

Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
 Any news on this isuue? I am also thinking about using built in mode and now I am 
concerned.

Consider to test cache mode. It is able to search through several
millions documents within 1-2 seconds

Reply: http://search.mnogo.ru/board/message.php?id=1183

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Differences beetwen ASPSEEK and MNOGOSEARCH

2001-01-29 Thread Kir

Author: Kir
Email: [EMAIL PROTECTED]
Message:
Now ASPSeek can use only MySQL. We are in a process of development support for other 
SQL databases.

Well, 400.000 is not so big for ASPSeek, so ordinary PII/PIII with 64-128 Mb of RAM 
will be enough I beleive. More RAM leads to faster indexing/searching speed. And don't 
forget to tune your MySQL (described in ASPSeek FAQ).

Size of index is about 1/3 to 1/2 of sum of indexed pages size. So it
depends of what pages do you want to index.

I'm sorry for replying on this board - this is not for ASPSeek. So please address your 
questions about ASPSeek to [EMAIL PROTECTED]

Reply: http://search.mnogo.ru/board/message.php?id=1181

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Indexing a multilanguage site

2001-01-29 Thread Alexander Barkov

Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
 Hello,
 
 I would like to index a web site that is in various languages, but when a user ask 
to the engine a search in a given language, I would like to show him a result only in 
the language he asked.
 
 How do I have to index the website, knowing that every main page of each language 
has a different URL ?
 
 Do I have to stock the datas in different Databases ?
 
 If every languages are saved in one database, will it slow the querys ?!
 
 Thanks
 
 

Take a look into the source of search.debian.org. There is a lang
HTML form variable to pass a language to search through.


Reply: http://search.mnogo.ru/board/message.php?id=1184

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Charset guesser needs an option

2001-01-29 Thread

Please  add  Cyrillic  charset  guesser on/off config file option even if
charset   guesser   compiled  in  by  --with-charset-guesser.  It  allows
automatically  build  binary  package  which  covers most wide capability
scope  and  then  turn  it  off in config files for sites that reallly not
needs it.

-- 
Andrey A. Chernov
http://ache.pp.ru/


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Charset guesser needs an option

2001-01-29 Thread Alexander Barkov

áÎÄÒÅÊ þÅÒÎÏ× wrote:
 
 Please  add  Cyrillic  charset  guesser on/off config file option even if
 charset   guesser   compiled  in  by  --with-charset-guesser.  It  allows
 automatically  build  binary  package  which  covers most wide capability
 scope  and  then  turn  it  off in config files for sites that reallly not
 needs it.
 


Thanks for suggestion. We'll do it.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Charset guesser needs an option

2001-01-29 Thread Alexander Barkov

áÎÄÒÅÊ þÅÒÎÏ× wrote:
 
 On Mon, Jan 29, 2001 at 19:31:21 +0400, Alexander Barkov wrote:
   scope  and  then  turn  it  off in config files for sites that reallly not
   needs it.
  
 
 
  Thanks for suggestion. We'll do it.
 
 Thanx.
 
 BTW, there must be ability to specify charset or guesser at
 per "Server" basis, i.e. as additional field in "Server" directive.
 I mean that different "Server"s may have different known or guessed
 charset which will be converted to local mnogosearch charset.
 
 I don't check yet, but following question arises:
 
 Do you analyze charset coming from HTTP header (for http:// scheme) or
 detect only META charset? Accorfing to HTTP specifications (RFC 2070)
 HTTP header charset must have priority over META charset.

It checks charset in the following order:

   charset it "Content-Type"  HTTP header
   charset in  META NAME="Content-Type"
   Charset indexer.conf command 


In the case when cyrillic charset guesser is compiled, indexer ignores 
everything and uses guesser.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Crash!

2001-01-29 Thread Mario Gray

Author: Mario Gray
Email: [EMAIL PROTECTED]
Message:
backtrace looks like this:
#0 0x40097b3c in free () from /lib/libc.so.6
#1 0x40097aed in free () from /lib/libc.so.6
#2 0x804c074 in UdmIndexNextURL (Indexe = 0x807b990, index_flags=4) at indexer.c:621
#3 0x8049eb7 in thread_main (arg=0x0) at main.c:253
#4 0x804a7e2 in main(argc=1, argv=0xbd18) at main.c:582

Please help me with this if you have ANY idea of this type of crash. 

Thanx


Reply: http://search.mnogo.ru/board/message.php?id=1185

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Webboard: Crash! Tainted prefix dirs

2001-01-29 Thread Mario Gray

Author: Mario Gray
Email: [EMAIL PROTECTED]
Message:
OK Its my bad.. .I have NO idea why, here it is:
Udmsearch 3.1.8 was the prefix used for 3.1.9, so when indexer(3.1.9)
tries to do anything, it finds tainted (config files?) information, and suddely leaps 
off the deep end into a catastrophic loop of memory leakage.
I thought that it only looks at the indexer.conf file, but someone please tell me if I 
am wrong in assuming so.

 l8r


Reply: http://search.mnogo.ru/board/message.php?id=1186

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Server order

2001-01-29 Thread Caffeinate The World

if indexer follows the order of Server command in the
indexer.conf file in order to index subsections before
parent sections:

Server http://host/depth1/depth2/
Server http://host/

how do you specify such order in ServerTable used in SQL?

__
Do You Yahoo!?
Yahoo! Auctions - Buy the things you want at great prices. 
http://auctions.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: splitter still core dumps on 3.1.9

2001-01-29 Thread Caffeinate The World

NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like
it was not fixed for 3.1.9. I'm using cache mode.

# gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "alpha--netbsd"...
(gdb) run -f 77c -t 77c
Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c -t
77c
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old:   0 new:   6
total:   6
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old:   0 new:   2
total:   2
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old:   0 new:   1
total:   1
/usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:   0 new:36482
total:36482

Program received signal SIGSEGV, Segmentation fault.
0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
601
table[header.ntables].wrd_id=logw
ords[t-1].wrd_id;
(gdb) bt
#0  0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
#1  0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address 0x1400013550
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
(gdb) l
596 logwords[count].weight=0;
597
598 for(t=1;tcount+1;t++){
599
if((logwords[t-1].wrd_id!=logwords[t].wrd
_id)||
600   
(logwords[t-1].weight!=logwords[t].wei
ght)){
601
table[header.ntables].wrd_id=logw
ords[t-1].wrd_id;
602
table[header.ntables].weight=logw
ords[t-1].weight;
603
table[header.ntables].pos=pos;
604
table[header.ntables].len=t*sizeo
f(UDM_CACHEWORD)-pos;
605
pos+=table[header.ntables].len;
(gdb)

__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: splitter still core dumps on 3.1.9

2001-01-29 Thread Caffeinate The World

just a quick note, i changed all occurances of "size_t" in cache.c into
"u_int32_t" and recompiled splitter. it seems as though it doesn't core dump on
log files like before. note that i had to do "splitter -p" to get new files in
./splitter and then run "splitter". i've only been able to test this on a small
set of logs. related to this, i also changed "size_t" in cachelogd.c to
"unsigned int". for some reason if i changed it to "u_int32_t" my server ran at
a very hight load.. usually it sits at around 1. but if i ran cachelogd with
"u_int32_t" changes, it ran at over 30 for system load. scary.


--- Caffeinate The World [EMAIL PROTECTED] wrote:
 NetBSD/Alpha (64bit). I reported this a while back for 3.1.9pre13. Looks like
 it was not fixed for 3.1.9. I'm using cache mode.
 
 # gdb /usr/local/install/mnogosearch-3.1.9/sbin/splitter
 GNU gdb 4.17
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "alpha--netbsd"...
 (gdb) run -f 77c -t 77c
 Starting program: /usr/local/install/mnogosearch-3.1.9/sbin/splitter -f 77c
 -t
 77c
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0B000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C0C000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C15000 old:   0 new:   6
 total:   6
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1C000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C1F000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C23000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2B000 old:   0 new:   2
 total:   2
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2E000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C2F000 old:   0 new:   1
 total:   1
 /usr/local/install/mnogosearch-3.1.9/var/tree/77/C/77C3 old:   0
 new:36482
 total:36482
 
 Program received signal SIGSEGV, Segmentation fault.
 0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
 601
 table[header.ntables].wrd_id=logw
 ords[t-1].wrd_id;
 (gdb) bt
 #0  0x1200182b0 in UdmSplitCacheLog (log=79072) at cache.c:601
 #1  0x120002ae0 in main (argc=1917, argv=0x1f8c0) at splitter.c:70
 warning: Hit heuristic-fence-post without finding
 warning: enclosing function for address 0x1400013550
 This warning occurs if you are debugging a function without any symbols
 (for example, in a stripped executable).  In that case, you may wish to
 increase the size of the search with the `set heuristic-fence-post' command.
 
 Otherwise, you told GDB there was a function where there isn't one, or
 (more likely) you have encountered a bug in GDB.
 (gdb) l
 596 logwords[count].weight=0;
 597
 598 for(t=1;tcount+1;t++){
 599
 if((logwords[t-1].wrd_id!=logwords[t].wrd
 _id)||
 600   
 (logwords[t-1].weight!=logwords[t].wei
 ght)){
 601
 table[header.ntables].wrd_id=logw
 ords[t-1].wrd_id;
 602
 table[header.ntables].weight=logw
 ords[t-1].weight;
 603
 table[header.ntables].pos=pos;
 604
 table[header.ntables].len=t*sizeo
 f(UDM_CACHEWORD)-pos;
 605
 pos+=table[header.ntables].len;
 (gdb)
 
 __
 Get personalized email addresses from Yahoo! Mail - only $35 
 a year!  http://personal.mail.yahoo.com/
 __
 If you want to unsubscribe send "unsubscribe udmsearch"
 to [EMAIL PROTECTED]
 


__
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]