Re: UdmSearch: Log Files

2000-11-03 Thread Alexander Barkov

Paul Stewart wrote:
 
 What happens in cache db mode if you don't delete the log files but run an
 update with new information from the indexer.conf file?  Is the system aware
 enough on the contents of the logs to add sites on without creating
 duplicate entries and/or remove stuff that's in the logs and that needs to
 be removed from the cache as well?
 
 I guess my question is:  Does it matter if you delete the log files each
 time you run splitter?

You can either delete log files or keep them. Splitter is smart enough
and it will not create any duplicates. If some document presents twice 
(or more times) in log file, splitter will take newest version.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: New message on the WebBoard #1: Segmentation Fault

2000-11-03 Thread Alexander Barkov

Thomas Yengst wrote:
 
 Alexander Barkov wrote:
 
  Author: Alexander Barkov
  Email: [EMAIL PROTECTED]
  Message:
  It is a bug in HTDB, you may try to increase
  MaxDocSize in indexer.conf.
 
 
 This works - I have 100,000 rows in a table and increasing MaxDocSize by
 a factor of 10 makes indexer not seg-fault.
 

Yes, it should work. It is old known bug and we'll fix it sometime.
We hasn't already done it just because there are some difficalties 
to make it in the current programm structure and not so many people use
HTDB. 
So, it is low priority task.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Div

I am using MnoGoSearch 3.1.8 and a slightly modified version of PHP frontend 
UDMSearch-php-3.1.1.2. While searching using the "|" operator or by using 
"any match" i ran into a strange bug: if i search for 2 or more words then 
one word in the query appears to generate more results than it should.
 
Thinking this was a bug in the PHP frontend i search for the same words using 
http://udm.aspseek.com/cgi-bin/search.cgi. But it happened again. Here are 
the strange results:

Search for: cooling reactor http

Search results:  cooling: 4396 reactor: 1645 http: 4358918
Displaying documents 1-20 of total 4358629 found. 

You notice that for the word "http" it shows 4,358,918 results but only 
4,358,629 documents are really found.

Can you do something about it?
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Alexander Barkov

Div wrote:
 
 I am using MnoGoSearch 3.1.8 and a slightly modified version of PHP frontend
 UDMSearch-php-3.1.1.2. While searching using the "|" operator or by using
 "any match" i ran into a strange bug: if i search for 2 or more words then
 one word in the query appears to generate more results than it should.
 
 Thinking this was a bug in the PHP frontend i search for the same words using
 http://udm.aspseek.com/cgi-bin/search.cgi. But it happened again. Here are
 the strange results:
 
 Search for: cooling reactor http
 
 Search results:  cooling: 4396 reactor: 1645 http: 4358918
 Displaying documents 1-20 of total 4358629 found.
 
 You notice that for the word "http" it shows 4,358,918 results but only
 4,358,629 documents are really found.

Example on udm.aspseek.com does not have boolean search. There an old
version is installed without boolean search.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Div

On Fri 03 Nov 2000 09:27, you wrote:
 Div wrote:
  I am using MnoGoSearch 3.1.8 and a slightly modified version of PHP
  frontend UDMSearch-php-3.1.1.2. While searching using the "|" operator or
  by using "any match" i ran into a strange bug: if i search for 2 or more
  words then one word in the query appears to generate more results than it
  should.
 
  Thinking this was a bug in the PHP frontend i search for the same words
  using http://udm.aspseek.com/cgi-bin/search.cgi. But it happened again.
  Here are the strange results:
 
  Search for: cooling reactor http
 
  Search results:  cooling: 4396 reactor: 1645 http: 4358918
  Displaying documents 1-20 of total 4358629 found.
 
  You notice that for the word "http" it shows 4,358,918 results but only
  4,358,629 documents are really found.

 Example on udm.aspseek.com does not have boolean search. There an old
 version is installed without boolean search.

Yes, you're right. On udm.aspseek.com i searched using "any match". Anyway, 
the results are still wrong!
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Alexander Barkov

 
  Example on udm.aspseek.com does not have boolean search. There an old
  version is installed without boolean search.
 
 Yes, you're right. On udm.aspseek.com i searched using "any match". Anyway,
 the results are still wrong!


All/any does not work too.
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Alexander Barkov

 Well, i don't give up! :)
 I searched on udm.aspseek.com only for http and it says:
 
 Search for: http
 Search results: http: 4358918
 Displaying documents 1-20 of total 4358622 found.


We've installed very beta version on askseek.com almost a
half a year ago. So,  very strange things may happen there.



 
 Why the "search results" number is higher than the "documents found" number?
 I remembered how i searched using boolean expressions on udm.aspseek.com:
 
 Search for: (http | web)
 Search results: http: 4358918 web: 648289
 Displaying documents 1-20 of total 4358897 found.
 
 The same thing happened to me on the PHP frontend. Could you please help me?


This seems to be a bug in PHP front-end. Sergey, could you help please?
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re[2]: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Sergey Kartashoff

Hi!

Friday, November 03, 2000, 12:42:53 PM, you wrote:

 Search for: (http | web)
 Search results: http: 4358918 web: 648289
 Displaying documents 1-20 of total 4358897 found.
 
 The same thing happened to me on the PHP frontend. Could you please help me?

AB This seems to be a bug in PHP front-end. Sergey, could you help please?

I need debug output from search.php. Cound you please give it to me ?

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: Re[2]: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Div

On Fri 03 Nov 2000 10:49, Sergey Kartashoff wrote:
 Hi!

 Friday, November 03, 2000, 12:42:53 PM, you wrote:
  Search for: (http | web)
  Search results: http: 4358918 web: 648289
  Displaying documents 1-20 of total 4358897 found.
 
  The same thing happened to me on the PHP frontend. Could you please help
  me?

 AB This seems to be a bug in PHP front-end. Sergey, could you help please?

 I need debug output from search.php. Cound you please give it to me ?

I think the most eloquent example is:

Begin ParseQ(): q=page | page | page
Begin ParseStr(): qwe=page | page | page

End ParseStr(): qwe=page|page|page

End ParseQ(): q=( page || page || page )
load_cs(): SELECT path,link,name FROM categories WHERE path LIKE '__' ORDER 
BY NAME ASC

load_cp(): SELECT name FROM categories WHERE path='' OR path IS NULL

is_stopword(): SELECT count(*) FROM stopwor WHERE word='page'

is_stopword(): SELECT count(*) FROM stopword WHERE word='page'

is_stopword(): SELECT count(*) FROM stopword WHERE word='page'

last_parse(): CREATE /*!32302 TEMPORARY */ TABLE t97325615967651800 ( url_id 
INT DEFAULT '0' NOT NULL, word_id INT DEFAULT '0' NOT NULL, intag TINYINT 
DEFAULT '0' NOT NULL, KEY i1t97325615967651800(url_id), KEY 
i2t97325615967651800(word_id))

last_parse(): INSERT INTO t97325615967651800 SELECT url_id,word_id,intag FROM 
ndict4 WHERE word_id = 336246304

last_parse(): SELECT count(*) FROM t97325615967651800

last_parse(): INSERT INTO t97325615967651800 SELECT url_id,word_id,intag FROM 
ndict4 WHERE word_id = 336246304

last_parse(): SELECT count(*) FROM t97325615967651800

last_parse(): INSERT INTO t97325615967651800 SELECT url_id,word_id,intag FROM 
ndict4 WHERE word_id = 336246304

last_parse(): SELECT count(*) FROM t97325615967651800

main(): SELECT SQL_SMALL_RESULT url_id, sum(intag) as r FROM 
t97325615967651800 GROUP BY url_id HAVING (( ( sum(word_id=336246304)0 ) OR 
( sum(word_id=336246304)0 ) OR ( sum(word_id=336246304)0 ) )) ORDER BY r 
DESC


Search results: page: 399; page: 399; page: 798; 

Documents 1-10 of 399 pages found.

SearchTime: 1.15s.

drop_temp_table(): DROP TABLE t97325615967651800
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re[6]: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Sergey Kartashoff

Hi!

Friday, November 03, 2000, 3:37:11 PM, you wrote:

D I choose that senseless query to show that for the third word the number of
D results are incorect.

D Here is another example (you notice that the number of results for "http" is 
D equal to 1722+399, where 1722 is the corect one and 399 coresponds to "page"):

ok , thank you.
Could you please give me results of this queries :

SELECT count(*) FROM ndict4 WHERE word_id = 410646757
SELECT count(*) FROM ndict4 WHERE word_id = 336246304
SELECT count(*) FROM ndict4 WHERE word_id = -1753739854

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: Segmentation Fault - core dumped after indexer

2000-11-03 Thread Vlad Gerasimov

Author: Vlad Gerasimov
Email: [EMAIL PROTECTED]
Message:
Hello!

version: mnogosearch-3.1.8 with mysql

After indexer I got:
...
[1] http://www.novgorod.ru/city/culture/fufin/photo/22.htm
[1] http://www.novgorod.ru/city/culture/fufin/photo/23.htm
[1] http://www.novgorod.ru/city/culture/fufin/photo/24.htm
[1] Done (1097 seconds)
Segmentation Fault - core dumped

It's bug or feature or what??

Thanks in advance,
Vlad

Reply: http://search.mnogo.ru/board/message.php?id=657

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re[6]: UdmSearch: I think i discovered a bug in boolean search

2000-11-03 Thread Sergey Kartashoff

Hi!

Friday, November 03, 2000, 3:45:17 PM, you wrote:


Dif ($row=fetch_row($res)) {
D $count=$row[0]-$count;
D $wordsinfo .= "b$old_norm_word/b: $count; ";

D $count=$row[0];

D}
 
ok, thank you! I will include this in the next release.

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: Segmentation Fault - core dumped after indexer

2000-11-03 Thread Vlad Gerasimov

Author: Vlad Gerasimov
Email: [EMAIL PROTECTED]
Message:
System: Solaris 7 x86
mysql: 3.23.21-beta

Vlad

Reply: http://search.mnogo.ru/board/message.php?id=658

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: New message on the WebBoard #1: Segmentation Fault - core dumped after indexer

2000-11-03 Thread Sergey Kartashoff

Hi!

Friday, November 03, 2000, 1:47:08 PM, you wrote:

VG After indexer I got:
VG ...
VG [1] http://www.novgorod.ru/city/culture/fufin/photo/22.htm
VG [1] http://www.novgorod.ru/city/culture/fufin/photo/23.htm
VG [1] http://www.novgorod.ru/city/culture/fufin/photo/24.htm
VG [1] Done (1097 seconds)
VG Segmentation Fault - core dumped

VG It's bug or feature or what??

Please load indexer and ints coredump into gdb:

gdb indexer core
# backtrace

and give us its output

-- 
Regards, Sergey aka gluke.


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: download timeout ?!

2000-11-03 Thread Martin Perst

Author: Martin Perst
Email: [EMAIL PROTECTED]
Message:
when my db reaches for example 50.000 links then this happens quite often when 
indexing;
Indexer[27668]: [1] http://vsk.sk/banky/klbanky.cgi?id=6500
Indexer[27668]: [1] Download timeout
Indexer[27668]: [1] http://rambo.elt.sk/minolta/minolta_t45_dealer.exe
Indexer[27668]: [1] Download timeout
Indexer[27668]: [1] http://www.minolta.sk/produkty/kopirovacie_stroje/di_151p.shtml
Indexer[27668]: [1] Download timeout

ReadTimeout is set to 90 When I try to access the given site using LYNX via shell then 
I get it always under 3-4 seconds. Where is the problem?

Reply: http://search.mnogo.ru/board/message.php?id=660

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: New message on the WebBoard #1: Segmentation Fault - core dumped after indexer

2000-11-03 Thread Vlad Gerasimov

Author: Vlad Gerasimov
Email: [EMAIL PROTECTED]
Message:
backtrace:

#0  0xdfb54806 in t_splay () from /usr/lib/libc.so.1
#1  0xdfb5470c in t_delete () from /usr/lib/libc.so.1
#2  0xdfb5447f in realfree () from /usr/lib/libc.so.1
#3  0xdfb549a3 in _free_unlocked () from /usr/lib/libc.so.1
#4  0xdfb5493c in free () from /usr/lib/libc.so.1
#5  0x805492a in UdmFreeStopList (Env=0x8080900) at sql.c:2144
#6  0x8067f78 in UdmFreeEnv (Env=0x8080900) at env.c:195
#7  0x804e66d in main (argc=1, argv=0x8047e18) at main.c:563

Reply: http://search.mnogo.ru/board/message.php?id=661

__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Some questions

2000-11-03 Thread Briggs, Gary

I'm not getting much luck, here; I'm hoping if you could answer a couple of
questions for me:

1) Security: I'm using the mySQL backend; how can I set up a user called
udm_ro [or similar] so that if the password is stolen, they person who stole
it can't delete any of the database. It's not possible just to remove create
and drop permissions, because it uses temporary tables... Does anyone other
than me even worry about this?

2) Tags: This is the best way to search through a selection of servers,
right? What can I do in search.htm to make a search on two of the tags, but
not any of the others? [I'm using the most up-to-date php front end]

Thank-you very much,
Gary (-;
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Strange error:

2000-11-03 Thread Briggs, Gary

I just did a search for "chunky", and this happened:


Search Time: 2.56s
Search results: looking for: chunky; chunky: 26; 

Sorry, but search returned no results.

Please try a different query. 


You can try to search this words: 


Argh! I'm using mySQL, and the PHP front-end

Gary (-;
__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]