[htdig] htdig database related questions

2000-12-06 Thread Haeberlen

Hi,

is there a way of "editing" the htdig documents database after the 
dig is finished ? We tried the BerkeleyDB tools that are included 
in the htdig distribution but e.g. db_dump refuses to do anything 
with any of the database files. The error messages always look like 
this:

db_dump: database_file: page X doesn't exist, create flag not set
db_dump: dbp-stat: I/O error

Is there anything wrong with our db files? htsearch seems to be able
to use them, though. Am I missing something?

Why do I want to "edit" the db files at all? The reason is that we have 
a large database with quite a number of things we'd like to exclude 
from the search results. The obvious solution would be to exclude them
from the dig in the first place. But I don't consider this possible 
because a) this would make the config quite bulky and b) it would be
desirable to be able to delete certain things from the database between
the regular digs without having to run a "full update" for each newly
discovered "exclude candidate".

Any suggestions? Many thanks in advance.

Cheers,

Thomas

PS: How does htdig handle the case where a document is in the docs database
but the corresponding URL is added to the exclude list? Will the document
be deleted from the db on the next update run, or would I have to delete the
db and run a "full index" again?



-- 
Thomas Haeberlen
Email: [EMAIL PROTECTED]


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] htdig database related questions

2000-12-06 Thread Geoff Hutchison

At 9:35 AM +0100 12/6/00, [EMAIL PROTECTED] wrote:
Is there anything wrong with our db files? htsearch seems to be able
to use them, though. Am I missing something?

No, but I don't think you want to use the db_dump programs to deal 
with them. In particular, ht://Dig "serializes" the documents in the 
document DB and can compress the excerpts, so large parts will come 
out in binary.

Why do I want to "edit" the db files at all? The reason is that we have
a large database with quite a number of things we'd like to exclude
from the search results. The obvious solution would be to exclude them
from the dig in the first place. But I don't consider this possible
because a) this would make the config quite bulky

You can always include a file in the config file, e.g.:
exclude_urls: `/path/to/patterns`

In the 3.2 code, you can do limited editing with the new htdump and 
htload programs. On the other hand, if you just want to delete URLs, 
it's much easier with the new htpurge program instead.

PS: How does htdig handle the case where a document is in the docs database
but the corresponding URL is added to the exclude list? Will the document
be deleted from the db on the next update run, or would I have to delete the
db and run a "full index" again?

The exclude_urls pattern set is only used when considering whether to 
index a new URL. So if a URL is already in the database, it will not 
be removed. There is a similar, but more serious problem, if a 
document is added to the robots.txt file. In both cases, the code is 
upholding the "letter of the law," but it's a bit hazy.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Pb indexing HTML with htdig 3.1.5

2000-12-06 Thread Geoff Hutchison

At 8:54 AM +0100 12/6/00, André LAGADEC wrote:
I think that htdig doesn't like the HTML code "!--//" and "//--", and
it see beginning of comment but not the end and ignore the rest of HTML
code of the page.

This is probably correct from the output you sent.

I am true ? An other idea ? What can I do ?

Can you edit the document as an initial workaround? If not, you (or 
someone else) will need to edit the HTML.cc file to make the comment 
patterns less picky.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] We Bring Israel To You !

2000-12-06 Thread israelproducts2

This is a one time email.
You will not receive any more correspondence from us, unless 
you request it!

Are you away from Israel?
Do you have relative and friends abroad?
Now you can enjoy:

Israeli Food
Israeli Music
Israeli Movies
Books in Hebrew
Children Videos in Hebrew
Unique Israeli Gifts
And much more.


If you want to find out where you can purchase all these
items and more email us at:
mailto:[EMAIL PROTECTED]?subject=SendMeMoreInfo
and we will send you more details.

 
 
 
 
 
 
 
 
 
 


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Can htdig kill Linux?

2000-12-06 Thread David Gewirtz


I just love getting to know new software. There's always some form of 
teething pain. Yesterday, I started running my first set of reasonably 
large htdig/htmerge processes. Came in today to find the Linux server 
(which is running nothing besides basic Mandrake processes and, of course, 
htdig) was deader than a doornail (have to say "deader than" because saying 
"hung more than" would just be too weird).

In any case, I couldn't telnet into the Linux box, couldn't run my KDE 
console, nada.

I've never seen Linux hang like that before. Almost makes me wish for NT.

But the net of it is this: is htdig liable to do this? Can I count on using 
htdig in a production environment or do I need to go back to square one? 
What's your experience? Advice?

Thanks in advance,

David



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] htdig database related questions

2000-12-06 Thread Geoff Hutchison

On Wed, 6 Dec 2000 [EMAIL PROTECTED] wrote:

  You can always include a file in the config file, e.g.:
  exclude_urls: `/path/to/patterns`
 
 Ok, so that works for exclude_urls, as well (... maybe we just shoulda
 tried that in the first place). Fine.

It works for any config attribute:

http://www.htdig.org/cf_variables.html

  In the 3.2 code, you can do limited editing with the new htdump and 
  htload programs. On the other hand, if you just want to delete URLs, 
  it's much easier with the new htpurge program instead.
 
 Will the 3.2 version be officially released soon? So far all I've seen
 is 3.2b2 ... can this be considered as stable enough for a production
 environment? 

A 3.2.0b3 release will be coming out soon--until that point I'd suggest
using the development snapshots in preference to 3.2.0b2, which has many
known bugs.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

 (rest of message attached)

  The exclude_urls pattern set is only used when considering whether to 
  index a new URL. So if a URL is already in the database, it will not 
  be removed.
 
 Ok. That would be a case for the above mentioned htpurge program, I guess.
 
 Thanks for the quick reply. I think we will have a look at the 3.2b2 
 version.
 
 Best wishes,
 
 Thomas Haeberlen
 
 
 
 -- 
 Thomas Haeberlen
 Rechenzentrum Universitaet Stuttgart (RUS) 
 Abteilung Informationsdienste  
 Allmandring 30 , D-70569 Stuttgart
 Email: [EMAIL PROTECTED]
 Phone: +49 711 685 47 19 Fax: +49 711 678 76 26
 




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Can htdig kill Linux?

2000-12-06 Thread Douglas S. Davis

I can answer this one!!!

I have Redhat Linux 5 and I have never, ever crashed it, even when
using htDig for both my internal and external web site parsings.  This
sounds like something deeper, like a hard disk error or bad RAM that
crashed when htDig ran across it.  I would recommend a good set of
tools like norton to test everything out.

Hope this is Helpful,
Doug

--
|  Information Systems Coordinator
|  Monical Pizza Corporation
|  http://www.monicals.com
| - - - - - - - - - -
|  "Home of the Family Pleaser . . . People Pleasing People"
| - - - - - - - - - -
|  815/937-1890 - Voice   815/937-9828 - Fax
--
Click below to Check out the Best Games on the net . . . and all for
FREE at POGO!
http://service.bfast.com/bfast/click?bfmid=23829683siteid=33400890bfpage=general_links

-




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Can htdig kill Linux?

2000-12-06 Thread Geoff Hutchison

On Wed, 6 Dec 2000, David Gewirtz wrote:

 In any case, I couldn't telnet into the Linux box, couldn't run my KDE 
 console, nada.
 
 I've never seen Linux hang like that before. Almost makes me wish for NT.

I have only rarely seen a server dead enough that it wouldn't accept
outside connections (e.g. telnet or ssh).

 But the net of it is this: is htdig liable to do this? Can I count on using 
 htdig in a production environment or do I need to go back to square one? 
 What's your experience? Advice?

You can certainly count on using 3.1.5 in a production environment--I have
never seen or heard reports of this sort of behavior.

So my first question would be "how long have you had this server running?"

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Can htdig kill Linux?

2000-12-06 Thread Max Pyziur



 I just love getting to know new software. There's always some form of
 teething pain. Yesterday, I started running my first set of reasonably
 large htdig/htmerge processes. Came in today to find the Linux server
 (which is running nothing besides basic Mandrake processes and, of
course,
 htdig) was deader than a doornail (have to say "deader than" because
saying
 "hung more than" would just be too weird).

 In any case, I couldn't telnet into the Linux box, couldn't run my KDE
 console, nada.

Can you check your system logs?  Were there any messages on your console?
Did your machine restart ok?
W/o any of that it sounds remotely like a "too many files open" problem.

 I've never seen Linux hang like that before. Almost makes me wish for
NT.

 But the net of it is this: is htdig liable to do this? Can I count on
using
 htdig in a production environment or do I need to go back to square one?
 What's your experience? Advice?

Testimonial:
We have about 15,000 pages on our primary site, 6,000 on one of our
virtual hosted sites.  The site is in two languages - English and
Ukrainian,
with some items in a third, Russian.  The locale is set for uk_UA.cp1251.

Our machine is currently two P3-550s, 512MB RAM, 2 20GB HDS; our
os is RH6.2 with most if not all of  the updates.

We rebuilt htdig rpms so that it would install to peculiarities of
our filesystem.

Indexing each site takes about twenty to thirty minutes.  Admittedly, we
got the firewpower (CPUs) to do that.

 Thanks in advance,

 David




Max Pyziur BRAMA - Gateway Ukraine
[EMAIL PROTECTED]  http://www.brama.com/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] 3.1.5 Compile problems on Linux

2000-12-06 Thread Foerst, Daniel P.

Hey all,

I'm new to this list but have gone through the FAQ and have been looking
through the Search Engine for people with a similar if not the exact
same problem.

I am using RedHat 6.2 with GCC 2.95.2 with GNU ld 2.9.5, and I have
libstdc++ 2.9.0-30 installed (latest version). This is htdig-3.1.5 

I am not able to figure out what is going wrong.. any assistance you can
lend is greatly appreciated!

Thanks much!

-dan

I run the configure and have the following...

##
## CONFIG
##
## This file is part of ht://Dig
##

#
# These variables are set by configure
#
# This specifies the root of the directory tree to be used by ht://Dig
prefix= /home2/htdig

# This specifies the root of the directory tree to be used for programs
# installed by ht://Dig
exec_prefix=${prefix}

#
# Please modify the variables below to reflect your preferences.
#

#
# DEST
#
# This specifies the root of the directory tree to be used by ht://Dig
#
DEST=  $(prefix)

#
# BIN_DIR
# Set this macro to where you want the binaries to be installed.
#
BIN_DIR=   $(exec_prefix)/bin

#
# CONFIG_DIR
# This is the directory that contains ht://Dig configuration files
#
CONFIG_DIR=$(DEST)/conf

#
# COMMON_DIR
# This is the directory for files that can be shared between different
# databases.
#
COMMON_DIR=$(DEST)/common

#
# DATABASE_DIR
# The default directory where the search databases will reside.
#
DATABASE_DIR=  $(DEST)/db

#
# DEFAULT_CONFIG_FILE
# This macro defines where the various programs will look for a
configuration
# file.
#
DEFAULT_CONFIG_FILE=   $(CONFIG_DIR)/htdig.conf

#
# CGIBIN_DIR
# The directory where your HTTP server looks for CGI programs.  This is
where
# htsearch will get installed.
#
CGIBIN_DIR= /sys3/apache-1.3.14/cgi-bin

#
# IMAGE_DIR
# Define this to be a place that can be accessed by your web server.
This is
# where a couple of images will be placed.
#
IMAGE_DIR=  /sys3/apache-1.3.14/htdocs/htdig

#
# IMAGE_URL_PREFIX
# This is the URL to prefix the images placed in IMAGE_DIR.
#
IMAGE_URL_PREFIX=/htdig

#
# SEARCH_DIR
# Set this to the absolute path where you want the sample search form to
# be installed.
#
SEARCH_DIR= /sys3/apache-1.3.14/htdocs/htdig

#
# SEARCH_FORM
# Set this to the name you want to give to the search form.  This form
# will be located in the SEARCH_DIR directory.
#
SEARCH_FORM=search.html

When I run make, everything works well, but then this slew of errors
takes place.

Entering directory `/sys2/installs/htdig-3.1.5/htfuzzy'
gcc -o htfuzzy -L../htlib -L../htcommon -L../db/dist -L/usr/lib
Endings.o EndingsDB.o Exact.o Fuzzy.o Metaphone.o Soundex.o
SuffixEntry.o Synonym.o htfuzzy.o Substring.o Prefix.o
../htcommon/libcommon.a ../htlib/libht.a ../db/dist/libdb.a 
EndingsDB.o: In function `Endings::createDB(Configuration )':
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:46: undefined reference
to `cout'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:46: undefined reference
to `ostream::operator(char const *)'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:52: undefined reference
to `cout'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:52: undefined reference
to `ostream::operator(char const *)'
EndingsDB.o: In function `Endings::createRoot(Dictionary , char *, char
*, char *)':
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:165: undefined reference
to `cout'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:165: undefined reference
to `ostream::operator(char const *)'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:165: undefined reference
to `ostream::operator(int)'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:165: undefined reference
to `ostream::operator(char)'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:166: undefined reference
to `cout'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:166: undefined reference
to `ostream::flush(void)'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:180: undefined reference
to `cout'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:180: undefined reference
to `ostream::operator(char const *)'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:180: undefined reference
to `ostream::operator(char const *)'
EndingsDB.o: In function `Endings::createRoot(Dictionary , char *, char
*, char *)':
/sys2/gcc/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/../../../../include/g++-3
/iostream.h:106: undefined reference to `endl(ostream )'
/sys2/gcc/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/../../../../include/g++-3
/iostream.h:106: undefined reference to `cout'
/sys2/gcc/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/../../../../include/g++-3
/iostream.h:106: undefined reference to `endl(ostream )'
EndingsDB.o: In function `Endings::expandWord(String , List ,
Dictionary , char *, char *)':
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:315: undefined reference
to `cout'
/sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:315: undefined reference
to 

Re: [htdig] Htdig with geramn umlaut under slackware

2000-12-06 Thread Gilles Detillieux

According to Jun Dong ([EMAIL PROTECTED]):
 Thanks for your tips.
 In Slackware 7.0 Packages there is no files of LC_CTYPE , LC_* etc.. under
 /usr/lib/locale/de or deutsch.
 Under /usr/lib/locale/de is only Directory LC_MESSAGES.
 I have copied directory de_DE which includs all files LC_* from SUSE 6.2 to
 SLACKWARE /usr/lib/locale und made symblolink de - de_DE.
 With your testlocale.cc code,  after the code compiIed, I give command testlocale
 de
 and the screen prints out exactly german accents with Umlaut.
 But unfortunately Htdig is always no function with german accents despite how I
 exactly
 configured Htdig.conf. This is really system problem from Slackware.

The problem with copying from a different system is that the C library
may be different, and therefore may require a different set of file
formats for locale support.  This was the case in the transition from
libc5 to glibc.  However, if testlocale.c did recognize the German
umlauts as alphanumeric, then it would suggest that things are mostly
working correctly.  I don't know why, but there are a few systems where
this test program works, but htdig's locale support doesn't.  I don't
know what else to point the finger at besides the C library, though.

 In other way I have found the Tips from:
 ftp://sol.ccsf.cc.ca.edu/htdig/paches/3.1.5/accents.zip.README
 I have modified HTML.cc and htsearch.cc again and recompiled Htdig and no more
 definition
 with locakle again. Finally Htdig with german accents is successfully installed.
 you can find the url where I installed Htdig:
 http://www.homepagemagazin.de/htdig/

The problem with the accents.zip patch is that it ends up stripping
off all accents by converting all accented letters in the ISO-8859-1
character set to their unaccented counterparts.  So, the excerpts won't
contain the accents.  While this isn't as nice as the accents.5 patch,
which adds accent support as a new fuzzy match method, the patch you
used is at least better than nothing for a system that doesn't properly
support locales.

 Gilles Detillieux wrote:
 
  I believe there are still problems with locale support on Slackware Linux
  systems.  See the thread entitled "Portuguese" from this past May:
 
  http://www.htdig.org/mail/2000/05/index.html#61
 
  I never did get a followup message from Rodrigo indicating whether he had
  found a solution, but you may want to try the tips I gave him.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] 3.1.5 Compile problems on Linux

2000-12-06 Thread Gilles Detillieux

According to Foerst, Daniel P.:
 I am using RedHat 6.2 with GCC 2.95.2 with GNU ld 2.9.5, and I have
 libstdc++ 2.9.0-30 installed (latest version). This is htdig-3.1.5 
 
 I am not able to figure out what is going wrong.. any assistance you can
 lend is greatly appreciated!
...
 I run the configure and have the following...
...
 prefix= /home2/htdig
 
 # This specifies the root of the directory tree to be used for programs
 # installed by ht://Dig
 exec_prefix=${prefix}

I'm not positive about this, but I think in makefiles like this one, you
need to use the syntax $(prefix), and not ${prefix} (i.e. use parentheses
instead of braces).

...
 When I run make, everything works well, but then this slew of errors
 takes place.
 
 Entering directory `/sys2/installs/htdig-3.1.5/htfuzzy'
 gcc -o htfuzzy -L../htlib -L../htcommon -L../db/dist -L/usr/lib
 Endings.o EndingsDB.o Exact.o Fuzzy.o Metaphone.o Soundex.o
 SuffixEntry.o Synonym.o htfuzzy.o Substring.o Prefix.o
 ../htcommon/libcommon.a ../htlib/libht.a ../db/dist/libdb.a 
 EndingsDB.o: In function `Endings::createDB(Configuration )':
 /sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:46: undefined reference
 to `cout'
 /sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:46: undefined reference
 to `ostream::operator(char const *)'
 /sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:52: undefined reference
 to `cout'
 /sys2/installs/htdig-3.1.5/htfuzzy/EndingsDB.cc:52: undefined reference
 to `ostream::operator(char const *)'

All of these should be in the libstdc++ library.  However, the makefile
is trying to link these with gcc rather than g++ or c++, which is probably
a big part of the problem.  I suspect something went wrong during the
run of ./configure, most likely because your C++ compiler and libraries
aren't installed where the configure program expected to find them.

...
 /sys2/gcc/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/../../../../include/g++-3
 /iostream.h:106: undefined reference to `endl(ostream )'
 /sys2/gcc/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/../../../../include/g++-3
 /iostream.h:106: undefined reference to `cout'
 /sys2/gcc/lib/gcc-lib/i686-pc-linux-gnu/2.95.2/../../../../include/g++-3
 /iostream.h:106: undefined reference to `endl(ostream )'

These error messages suggest that the C++ header files are not in the
standard location.  The compiler found them OK, but things are messing
up and the linking stage.  Is there a reason why you didn't just use
the egcs-c++ and libstdc++ RPM packages that came with Red Hat 6.2?
Those work fine with ht://Dig.  I suspect that your setup as it is now
wouldn't work well with any software that needs C++.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Can htdig kill Linux? (redux)

2000-12-06 Thread Geoff Hutchison

On Wed, 6 Dec 2000, David Gewirtz wrote:

 * Is there a way to tell what files got chomped by the fsck and have 
 lost+found nodes?

Nope. That's why they're "lost and found." You can, however, take a look
at what's in there.

 * Is there a way to check a log for htdig?

Not unless you were writing one as part of the task. For example, many
people have a cron job that runs htdig/htmerge and sends the results as a
mail message. This would usually involve a temporary file, which you could
check.

 * Is an fsck -f -y good enough, or should I reformat and reinstall the hard 
 drive?

fsck is usually just fine. If you see repeated disk problems, then you may
want to do a reformatting with options to get rid of bad sectors. With the
current prices of disks, it's also a reasonable option to just buy a new
disk if there seem to be media problems.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] SQL handling start_url

2000-12-06 Thread Curtis Ireland

Hypothetical Situation:

I have an SQL database table of links I wish to present someone visiting
my site. However, I would like to make these links searchable from my
site. Normally, if these links were static, I would just list them in
the htdig.conf file.

Is there any way to have start_url get its list from an SQL back-end?
Has anyone already built a patch to handle this?

Here are a couple of solutions I can think of to bi-pass the problem,
but I'm sure I'm not alone in desiring this feature.

1) Build a PHP link built with links to all the sites we want to index.
Have htDig use this as its start_url
2) Before htDig starts its database build, dump all the links to a text
file and have the htdig.conf include this file

The one problem with these two solutions is how would the limit_urls_to
variable work? I want to make sure the links are properly indexed
without going past the linked site.

Just something for everyone to wrap your heads around.
-C

--
Curtis Ireland  - [EMAIL PROTECTED]
Solidum Systems - http://www.solidum.com
(T) (613)724-6004 x284  - (F) (613)724-6008


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Can htdig kill Linux?

2000-12-06 Thread Clint Gilders

David Gewirtz wrote:
 
 I just love getting to know new software. There's always some form of
 teething pain. Yesterday, I started running my first set of reasonably
 large htdig/htmerge processes. Came in today to find the Linux server
 (which is running nothing besides basic Mandrake processes and, of course,
 htdig) was deader than a doornail (have to say "deader than" because saying
 "hung more than" would just be too weird).

I use Mandrake at home and love it, but have nothing but problems with
it in Server environment.  Our lone Linux Server (The rest are free BSD)
has been crashing daily (hanging, not telnet, no ftp etc) since we
installed apache/mod_ssl.  Even before that it wasn't the most reliable
box going.   If you are going to continue to use it in a production
environment I suggest not running X or KDE as these can eat up 60% of
you CPU.

We have indexed well over 200,000 documents with htdig running on a
single Free BSD machine without as much as hiccup.

Almost makes me wish for NT.
Be careful what you wish for!  You just might get it.   Ahh!!! The
horror.

-- 
Clint Gilders
Servermaster Onlinehobbyist Inc.
[EMAIL PROTECTED]


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Htdig in spanish

2000-12-06 Thread Geoff Hutchison

At 5:59 PM -0600 12/6/00, Heriberto Cantu wrote:
It was a fast work so probably need a second review and the completion
of the synonyms.es file.

I think it a good idea to have this package in the www.htdig.org site,
but couln't find a way to upload this.

You can try ftp://www.htdig.org/upload/ but it might be worth 
thinking about a "File Upload" form. If anyone has coded a CGI like 
this (and can ensure that files transfer in binary form), it might be 
worth trying.

And, of course, as you did earlier today, you can send files to me or 
Gilles to place in the repository. (Keep in mind, even with 
uploading, someone will still need to move it into place.)

It should be mirrored within an hour or so.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html