[htdig] Newbie Questions.

2000-03-12 Thread Sphboc
I just recently got 3.1.5 installed; appears to be operating properly. Couple of issues, however, related to what I'm trying to accomplish with it. A. Is there any way of intercepting (and "filtering") the output of htsearch -- after this is generated and just prior to its being actually

Re: [htdig] Newbie Questions.

2000-03-13 Thread Sphboc
In a message dated 3/13/00 3:43:35 AM US Mountain Standard Time, [EMAIL PROTECTED] writes: Hi, [EMAIL PROTECTED] wrote: I just recently got 3.1.5 installed; appears to be operating properly. Couple of issues, however, related to what I'm trying to accomplish with it. A. Is

[htdig] Htdig 3.1.5 -- Infinite Loop Possible?

2000-03-15 Thread Sphboc
Have made a few runs of this which, after several hours, had to be terminated; appeared that same pages were being, repeatedly, re-indexed. In event that page a points to page b, and b points back to a, how does htdig avoid an infinite loop? In particular, is it necessary to limit the

[htdig] control of htmerge.

2000-03-15 Thread Sphboc
If I need to specify both an alternate conf file and a merge file, is the proper syntax: ../htmerge -c alternate.conf -m merge.conf ie, is a - required before both of the option letters? Also, is the whitespace, between the option letters and the filename, optional?

[htdig] databases (3.1.5)

2000-03-16 Thread Sphboc
Is there some documentation on the format/content of the databases, as produced by htdig and htmerge? What I'd like to be able to do, if feasible, is to tell, from the databases themselves which url's have been indexed, and ideally the date on which this was done.

Re: [htdig] Introductory questions

2000-03-17 Thread Sphboc
Agree with Carlson's reply, but would add one comment. To re-index a single site, you'll probably be happier using a set of files containing only that site's content. Then, schedule merging of the single-site files with your "master" set. You also need to realize that any merge involving

[htdig] Unable to run htdig against sharperimage.com

2000-03-17 Thread Sphboc
Have installed 3.1.5; been able to index most of the sites in which I'm interested. One which has been consistently failing is per following configuration file: htdig.conf is the vanilla configuration; all other (successful) searches use it as the root and include mod's similar to what

[htdig] Duration of Htsearch Processing (3.1.5)

2000-03-18 Thread Sphboc
We are running into some situations where the duration of htsearch processing -- when a fairly-common word has been sought -- is long enough to cause problems (timeouts in the invoking process). Looking at documentation, it does not appear that there is any option in either the conf file or

[htdig] Htdig/Htmerge -- When pre-existing databases are involved.

2000-03-20 Thread Sphboc
Have installed Release 3.1.5*. Under an approach which searches one url at a time, and always re-initializes that url's files prior to searching, have been able to get all the relevant domains (some 20; call them domain01 thru domain20) combined into one searchable data base; searches appear

[htdig] Need 1GB Disk Space; on Shared Server

2000-03-20 Thread Sphboc
We are in the process of indexing a number of sites. The way we're doing it, the resulting files occupy quite a bit of space. Expected transfer volume, however, is relatively low. (Due to a combination of no great business volume yet, and fact that we return only a very-truncated form of

[htdig] Databases -- Read-access modules. (3.1.5)

2000-03-21 Thread Sphboc
I've gotten 3.1.5 installed and operating. Due to rather specialized requirements, I need to locate some logic which can read and return content from: db.words.db db.docdb db.docs.index Presumably, these are in some fairly-standard database format; if I could determine what

[htdig] 3.1.5 -- Documentation on Database structures available?

2000-03-24 Thread Sphboc
Is there any documentation available, on directly accessing the db.docdb and db.words.db databases? Partly for efficiency, and partly because I want to learn how to accomplish this--from within a Perl script--I'd like to be able to directly access the databases. (Have developed a process

[htdig] (semi-off-topic) Dummy files in Perl?

2000-03-28 Thread Sphboc
Is Perl's behavior specified when I try to write into (ie using the PRINT statement) a file variable which has never been opened? I'd like it to simply discard the output; can I rely upon this happening? (Same idea as DD DUMMY in MVS . . ). Steven P Haver/602-242-9708

[htdig] htdig -- infinite looping (3.1.5) and redirection

2000-03-31 Thread Sphboc
I'm running into some instances where htdig never appears to terminate. Results which have been found, up to the point of termination, appear to be valid as far as they go; I haven't yet tried analyzing the url list for a pattern of repitition. Is there any particular type of problem, within

Re: [htdig] P: access docdb with perl

2000-04-03 Thread Sphboc
I will also be MOST interested in the specifics of the database, and exactly what one has to do to access it with an external program. This is one area which does NOT appear to be covered in the (otherwise-excellent) documentation, or in the FAQ. Hello, i want to access the docdb

[htdig] How to manage infinite-loop conditions in Htsearch. (3.1.5)

2000-05-09 Thread Sphboc
Appears that, in real world, htsearch 3.1.5 will from time to time loop; due basically to configuration file not set up to deal with actual conditions at searched web site(s). Does Unix have any ability to limit elapsed time (and/or disk space) used by an attempt to run htsearch?

Re: [htdig] How to manage infinite-loop conditions in Htsearch. (3.1.5)

2000-05-09 Thread Sphboc
Sorry -- I meant htdig -- the process which actually goes out and searches the website(s). "Seems logical" that, if htdig is initiated from a unix-shell script, there ought to be a way to limit elapsed time; does anyone have a working example of this/equivalent? According to [EMAIL

[htdig] (OT) Unix Source-Management Software; Source?

2000-08-24 Thread Sphboc
Has anyone come up with a more-or-less integrated approach to issue of (Unix) software version control? Overall intent being to retain a record of the PRIOR content of a production library. Preferably, in a form which would support automated restoration to a previous date and time.

[htdig] Installation material -- 3.1.5

2000-09-05 Thread Sphboc
I've got 3.1.5 up and running. Due to disk-space considerations, am considering removal of the "installation" folder structure. Does not appear to me that this participates in routine execution of htdig . . . am I overlooking anything? === (ls command output follows)

[htdig] (Off Topic) How Intercept TCP Packets?

2000-09-25 Thread Sphboc
I have a Windows 98 system; uses (standard) tcp/ip to allow an application to communicate with the Internet. I'd like to intercept (a copy of) the TCP packets, as received and sent by the application, for recording and inspection. (And, let the packets continue as if no interception were

[htdig] Exclude_urls (3.1.5).

2000-11-01 Thread Sphboc
documentation says: If a URL contains any of the space separated patterns, it will be rejected. Consider the following: Exclude_urls: fuseaction=readmessage (in the config file) http://www.autobytel.com/content/service/index.cfm?fuseaction=readmessagem=30 5id=4f=4: A. Will this url be

[htdig] 3.1.5 vs 3.2.0B2.

2000-11-01 Thread Sphboc
I looked in the FAQ; found relatively little discussion of: A. Is it safe to assume that 3.1.5 is regarded as "stable"; unlikely to receive further development effort (barring discovery of a significant problem)? B. Is there a "what's new" list, concerning 3.2? Especially including

[htdig] 3.20b2 -- Phrase Searching?

2000-11-07 Thread Sphboc
This appears to have a pretty-good capability for specification of boolean expressions. I haven't, however, come across any documentation concerning the ability to specify the RELATIVE ORDER in which two or more words appear, or to require that they be adjacent. One example would be "car

[htdig] 3.1.5 -- Wordlist files / space occupancy.

2000-11-07 Thread Sphboc
Do the .wordlist files, created by htdig, serve any useful purpose once they have been input to htmerge? If the database created by htmerge is later merged with another database, is it necessary to read the .wordlist files at this time? (I suspect not, since the information ought to be in

[htdig] 3.20/b2 -- SCORE Variable

2000-11-12 Thread Sphboc
I've been trying to experiment with the various weighting factors; by specifying xxx_FACTOR values in the conf file. Appears that, whatever values I use, long-format displays come out with a single star. Closely related, is there a simple way to actually see the value of $(SCORE) and/or

[htdig] SCORE -- follow-up.

2000-11-12 Thread Sphboc
I did figure out how to SEE the value of SCORE . . conf-file example is actually pretty clear on this. Still interested in how htdig COMPUTES this, however. Steven P Haver/602-242-9708 To unsubscribe from the htdig mailing list, send a message to

Re: [htdig] How to exclude part of a html page ?

2000-11-13 Thread Sphboc
Are the noindex_start/noindex_end parameters allowed to have multiple values; such that more than one type of content can be excluded? (Might be "of interest" to document the place/table, in the conf-file-processing logic, where this sort of information is recorded). In a message dated

Re: [htdig] Additional variables for htsearch

2000-11-14 Thread Sphboc
I have a similar interest. Appears that any environment variable, if set prior to invoking htsearch, is retained and can be used in output templates. In Perl, the setting can be accomplished by assignment to $ENV{VARNAME}, where VARNAME is a user-chosen name. This has to be done just prior

[htdig] Redirection of Htdig output -- 3.20b2

2000-11-17 Thread Sphboc
My intent is to capture the STDOUT, from HTDIG, in a disk file. Following code operates as intended (Linux system) #!/mybin/sh URLMAIN=mallst CONFDIR=/htdig3.2b2/sngl/conf DBDIR=/htdig3.2b2/sngl/data BINDIR=/htdig3.2b2/bin # echo "progname = $0 / $URLMAIN" TMPDIR=$DBDIR export TMPDIR

[htdig] [off topic] -- how to reset STDOUT Assignment

2000-11-17 Thread Sphboc
Have a Perl Script which invokes execution of htdig and htmerge. Similar purpose to rundig, but use of shell scripts, in the specific environment, is not practicable. I want to direct the STDOUT, from htdig/htmerge, to disk files. open (STDOUT,"diskfile"), followed by system commands to

[htdig] 3.20b2 -- oddity

2000-11-29 Thread Sphboc
I tried to do htsearch, using the following .conf file: site_id:10009 include:/www/vhosts/a/autosearchusa.com/htdig3.2b2/conf/cv_0.conf database_dir: /www/vhosts/a/autosearchusa.com/htdocs/www/u-wrk /sngl/data database_base: ${database_dir}/dt_${site_id}

[htdig] 3.20b(2/3)?

2000-11-29 Thread Sphboc
A. Is there a projected release date for 3.20B3? B. Does the "collections" feature, as documented for 3.20b2, appear to be reliable? Have any comparisons of search efficiency, as opposed to searching a database created by merging the constituent components, been done? C. (Sorry to be

[htdig] 3.20b2 ${common_dir} -- reliably alterable?

2000-12-03 Thread Sphboc
If the system is COMPILED with a standard value for this, but a differing specification is present at the top of a ".conf" file: A. Will the COMPILED value be reliably and completely overridden? B. Is it necessary that the override be in the FIRST encountered ".conf" file? My primary

[htdig] (Off Topic) - use of #!/usr/bin/perl in Windows environment.

2001-01-11 Thread Sphboc
I have a number of cgi scripts running in a Linux environment. First line of all such, for apache/unix, is: #!/usr/bin/perl Under Apache/Linux, this works as expected. Just recently installed Apache (1.13) on a Windows 98 machine. In this environment, perl.exe(5.005_03)resides in /perl/bin.

[htdig] Htdig 3.20b3 -- installation problems.

2001-01-18 Thread Sphboc
I've been able to successfully install (and execute, giving valid results), the 1/14/01 snapshot of this. To get this to work, however, I had to invoke the "--without-zlib" option. I obtained zlib113, uploaded it to the server, decompressed, and attempted to compile. (I do NOT have authority