I just recently got 3.1.5 installed; appears to be operating properly.
Couple of issues, however, related to what I'm trying to accomplish with it.
A. Is there any way of intercepting (and "filtering") the output of htsearch
-- after this is generated and just prior to its being actually
In a message dated 3/13/00 3:43:35 AM US Mountain Standard Time,
[EMAIL PROTECTED] writes:
Hi,
[EMAIL PROTECTED] wrote:
I just recently got 3.1.5 installed; appears to be operating properly.
Couple of issues, however, related to what I'm trying to accomplish with
it.
A. Is
Have made a few runs of this which, after several hours, had to be
terminated; appeared that same pages were being, repeatedly, re-indexed.
In event that page a points to page b, and b points back to a, how does htdig
avoid an infinite loop?
In particular, is it necessary to limit the
If I need to specify both an alternate conf file and a merge file, is the
proper syntax:
../htmerge -c alternate.conf -m merge.conf
ie, is a - required before both of the option letters?
Also, is the whitespace, between the option letters and the filename,
optional?
Is there some documentation on the format/content of the databases, as
produced by htdig and htmerge?
What I'd like to be able to do, if feasible, is to tell, from the databases
themselves which url's have been indexed, and ideally the date on which this
was done.
Agree with Carlson's reply, but would add one comment. To re-index a single
site, you'll probably be happier using a set of files containing only that
site's content. Then, schedule merging of the single-site files with your
"master" set.
You also need to realize that any merge involving
Have installed 3.1.5; been able to index most of the sites in which I'm
interested.
One which has been consistently failing is per following configuration file:
htdig.conf is the vanilla configuration; all other (successful) searches use
it as the root and include mod's similar to what
We are running into some situations where the duration of htsearch processing
-- when a fairly-common word has been sought -- is long enough to cause
problems (timeouts in the invoking process).
Looking at documentation, it does not appear that there is any option in
either the conf file or
Have installed Release 3.1.5*. Under an approach which searches one url at a
time, and always re-initializes that url's files prior to searching, have
been able to get all the relevant domains (some 20; call them domain01 thru
domain20) combined into one searchable data base; searches appear
We are in the process of indexing a number of sites. The way we're doing it,
the resulting files occupy quite a bit of space. Expected transfer volume,
however, is relatively low. (Due to a combination of no great business volume
yet, and fact that we return only a very-truncated form of
I've gotten 3.1.5 installed and operating. Due to rather specialized
requirements, I need to locate some logic which can read and return content
from:
db.words.db
db.docdb
db.docs.index
Presumably, these are in some fairly-standard database format; if I could
determine what
Is there any documentation available, on directly accessing the db.docdb and
db.words.db databases?
Partly for efficiency, and partly because I want to learn how to accomplish
this--from within a Perl script--I'd like to be able to directly access the
databases. (Have developed a process
Is Perl's behavior specified when I try to write into (ie using the PRINT
statement) a file variable which has never been opened?
I'd like it to simply discard the output; can I rely upon this happening?
(Same idea as DD DUMMY in MVS . . ).
Steven P Haver/602-242-9708
I'm running into some instances where htdig never appears to terminate.
Results which have been found, up to the point of termination, appear to be
valid as far as they go; I haven't yet tried analyzing the url list for a
pattern of repitition.
Is there any particular type of problem, within
I will also be MOST interested in the specifics of the database, and exactly
what one has to do to access it with an external program. This is one area
which does NOT appear to be covered in the (otherwise-excellent)
documentation, or in the FAQ.
Hello,
i want to access the docdb
Appears that, in real world, htsearch 3.1.5 will from time to time loop; due
basically to configuration file not set up to deal with actual conditions at
searched web site(s).
Does Unix have any ability to limit elapsed time (and/or disk space) used by
an attempt to run htsearch?
Sorry --
I meant htdig -- the process which actually goes out and searches the
website(s). "Seems logical" that, if htdig is initiated from a unix-shell
script, there ought to be a way to limit elapsed time; does anyone have a
working example of this/equivalent?
According to [EMAIL
Has anyone come up with a more-or-less integrated approach to issue of (Unix)
software version control?
Overall intent being to retain a record of the PRIOR content of a production
library. Preferably, in a form which would support automated restoration to
a previous date and time.
I've got 3.1.5 up and running. Due to disk-space considerations, am
considering removal of the "installation" folder structure.
Does not appear to me that this participates in routine execution of htdig .
. .
am I overlooking anything?
=== (ls command output follows)
I have a Windows 98 system; uses (standard) tcp/ip to allow an application to
communicate with the Internet.
I'd like to intercept (a copy of) the TCP packets, as received and sent by
the application, for recording and inspection. (And, let the packets
continue as if no interception were
documentation says:
If a URL contains any of the space separated patterns, it will be rejected.
Consider the following:
Exclude_urls: fuseaction=readmessage (in the config file)
http://www.autobytel.com/content/service/index.cfm?fuseaction=readmessagem=30
5id=4f=4:
A. Will this url be
I looked in the FAQ; found relatively little discussion of:
A. Is it safe to assume that 3.1.5 is regarded as "stable"; unlikely to
receive further development effort (barring discovery of a significant
problem)?
B. Is there a "what's new" list, concerning 3.2? Especially including
This appears to have a pretty-good capability for specification of boolean
expressions.
I haven't, however, come across any documentation concerning the ability to
specify the RELATIVE ORDER in which two or more words appear, or to require
that they be adjacent.
One example would be "car
Do the .wordlist files, created by htdig, serve any useful purpose once they
have been input to htmerge?
If the database created by htmerge is later merged with another database, is
it necessary to read the .wordlist files at this time? (I suspect not, since
the information ought to be in
I've been trying to experiment with the various weighting factors; by
specifying xxx_FACTOR values in the conf file.
Appears that, whatever values I use, long-format displays come out with a
single star.
Closely related, is there a simple way to actually see the value of $(SCORE)
and/or
I did figure out how to SEE the value of SCORE . . conf-file example is
actually pretty clear on this.
Still interested in how htdig COMPUTES this, however.
Steven P Haver/602-242-9708
To unsubscribe from the htdig mailing list, send a message to
Are the noindex_start/noindex_end parameters allowed to have multiple values;
such that more than one type of content can be excluded?
(Might be "of interest" to document the place/table, in the
conf-file-processing logic, where this sort of information is recorded).
In a message dated
I have a similar interest. Appears that any environment variable, if set
prior to invoking htsearch, is retained and can be used in output templates.
In Perl, the setting can be accomplished by assignment to $ENV{VARNAME},
where VARNAME is a user-chosen name. This has to be done just prior
My intent is to capture the STDOUT, from HTDIG, in a disk file.
Following code operates as intended (Linux system)
#!/mybin/sh
URLMAIN=mallst
CONFDIR=/htdig3.2b2/sngl/conf
DBDIR=/htdig3.2b2/sngl/data
BINDIR=/htdig3.2b2/bin
# echo "progname = $0 / $URLMAIN"
TMPDIR=$DBDIR
export TMPDIR
Have a Perl Script which invokes execution of htdig and htmerge. Similar
purpose to rundig, but use of shell scripts, in the specific environment, is
not practicable.
I want to direct the STDOUT, from htdig/htmerge, to disk files. open
(STDOUT,"diskfile"), followed by system commands to
I tried to do htsearch, using the following .conf file:
site_id:10009
include:/www/vhosts/a/autosearchusa.com/htdig3.2b2/conf/cv_0.conf
database_dir: /www/vhosts/a/autosearchusa.com/htdocs/www/u-wrk
/sngl/data
database_base: ${database_dir}/dt_${site_id}
A. Is there a projected release date for 3.20B3?
B. Does the "collections" feature, as documented for 3.20b2, appear to be
reliable? Have any comparisons of search efficiency, as opposed to searching
a database created by merging the constituent components, been done?
C. (Sorry to be
If the system is COMPILED with a standard value for this, but a differing
specification is present at the top of a ".conf" file:
A. Will the COMPILED value be reliably and completely overridden?
B. Is it necessary that the override be in the FIRST encountered ".conf"
file?
My primary
I have a number of cgi scripts running in a Linux environment. First line of all such, for apache/unix, is:
#!/usr/bin/perl
Under Apache/Linux, this works as expected.
Just recently installed Apache (1.13) on a Windows 98 machine.
In this environment, perl.exe(5.005_03)resides in /perl/bin.
I've been able to successfully install (and execute, giving valid results),
the 1/14/01 snapshot of this.
To get this to work, however, I had to invoke the "--without-zlib" option.
I obtained zlib113, uploaded it to the server, decompressed, and attempted to
compile. (I do NOT have authority
35 matches
Mail list logo