Re: [htdig] Can't get my search to update correctly..

2000-10-05 Thread Ramon Gonzalez

Is /www/itss above or below your main web root directory?

Although I am still learning the ropes with ht:/dig, this is one test I have
done related to this problem.

I have a root web directory (/home/httpd/html) with a few test pages that
hyperlink to each other. In the same directory I have the Apache server
documentation under "manuals/misc" but none of my pages refer to the Apache
manuals. If I add a hidden hyperlink in this fashion:



in my main index.html page, all the Apache manuals get added by htdig. If I
take it out, they are omitted. The above anchor is not seen on the page
since there is no text to hyperlink, but it is a valid hyperlink.

So basically hyperlink anchors  are required for the pages to relate to
each other and you do mention you have added this. I guess you can also add
an additional URL directory path in your "start_url" parameter.

- Original Message -
From: "Rivera, Tony" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 05, 2000 12:28 PM
Subject: [htdig] Can't get my search to update correctly..


> Hello Everyone,
>
> I have a what should be a *simple* question.  I am having a hard time
> getting htdig to update it's searching.  I have my crontab setup to run
> htdig everynight so it will catch changes made to the web server
throughout
> the day.  However, that's not working...about 5 days ago I added a new
> directory /www/itss and have made numerous links to it from my index page
> and various other pages on the server and it is still not getting picked
up
> when I do a search for it.
>
> I am not quite sure what I am missing here...I have read through all the
> archives on the site, ran /opt/www/htdig/bin/htdig -v -a -s and
> /opt/www/htdig/bin/htmerge -v -a -s but is still doesn't update.
>
> Basically all I want is for htdig to go out and update every night so that
> it will recognize any changes made to the web server earlier in the day.
>
> Thanks in advance from a newbie!
>
> Cheers,
> Tony
>
> 
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> List archives:  
> FAQ:
>
>



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Question about url_part_aliases

2000-10-05 Thread Ramon Gonzalez

Ok, got it to work now. I couldn't understand how I can have 2 parameters in
the same config file, but using "separate" ones made it work.

I'll have to make sure I have separate config files for digging and
searching to avoid conflicts!

I gave a small presentation of ht://dig to my boss today and he was very
pleased. His concern is the SSL problem and the lack of 128 bit encrytion
algorithm within htdig. I will be getting a test server soon with SSL to
test the suggestions I have been able to dig up from the archives.

Thanks!


- Original Message -
From: "Geoff Hutchison" <[EMAIL PROTECTED]>
To: "Ramon Gonzalez" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, October 05, 2000 8:33 AM
Subject: Re: [htdig] Question about url_part_aliases


> At 1:03 AM -0400 10/5/00, Ramon Gonzalez wrote:
> >bypass the web server. When the user does a search, I need to return all
> >URL's with https:// but this is not happening at all. And yes, I'm
running
> >htdig (with the -i option) and htmerge. Any help is appreciated!
> >
> >Below are my config entries for my test web server:
> >
> >local_urls_only:true
> >local_urls: http://192.168.0.3/=/home/httpd/html/
> >start_url:  http://192.168.0.3/
http://192.168.0.3/manual/misc
> >url_part_aliases:   http:// *site
> >url_part_aliases:   https:// *site
>
> You misunderstood the documentation. You put one url_part_aliases
> attribute in the config file you use when indexing. Then you make
> another config file (which could just include the other one) and then
> put the *other* url_part_aliases attribute in that one. When you
> search, you specify the second one.
>
> e.g.
>
> htdig.conf:
> ...
> url_part_aliases:   http:// *site
>
> search.conf:
> include: htdig.conf
> url_part_aliases:   https:// *site
>
> This serves to keep the HTTP URLs when indexing (since the indexer
> cannot index HTTPS) and serve up HTTPS when searching.
>
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
> 
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> List archives:  
> FAQ:
>
>



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] puzzled by htdig

2000-10-05 Thread Geoff Hutchison

On Thu, 5 Oct 2000, GYGAX,OTTO (HP-Corvallis,ex1) wrote:

> My limit_urls_to key is set as you have it below (default). 
> My start_url is currently set to a list of urls such as http:///,
> http:///arch.html, http:///dir1, http:///dir2,
> http:///dir3, ... where  arch.html is a simple web page with a href
> pointer to http:///~arch, the cover page to the Mhonarc mailing tree
> that contains links to every single mailing archive page.

OK, but then ~arch won't fall into the limits as you've set them (since
it's not any of the patterns in start_url). If you want to index all
documents on the server, you may want a more liberal limit_urls_to
directive, e.g.

limit_urls_to: http:///

> Before I extended the start_url key attr., I only had http:/// and
> http:///arch.html, but htdig went as far as the few links off the
> server's index.html file, missing all other directories at the root. At one

OK, that was one of my points--it will follow the links it sees. So if you
index starting with http://server/ then it will follow links from
index.html. Unless you add those directories (as you did) to start_url, it
won't even know they're there.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] puzzled by htdig

2000-10-05 Thread GYGAX,OTTO (HP-Corvallis,ex1)

Thanks, Geoff, for getting back to me.

My limit_urls_to key is set as you have it below (default). 
My start_url is currently set to a list of urls such as http:///,
http:///arch.html, http:///dir1, http:///dir2,
http:///dir3, ... where  arch.html is a simple web page with a href
pointer to http:///~arch, the cover page to the Mhonarc mailing tree
that contains links to every single mailing archive page.

Before I extended the start_url key attr., I only had http:/// and
http:///arch.html, but htdig went as far as the few links off the
server's index.html file, missing all other directories at the root. At one
point it somehow managed to follow the link in arch.html but that stopped and is
what I'm trying to resolve now.

By including all the other directories at the root I'm now getting a more
exhaustive database of search items as originally intended, but the one I really
need (~arch and the tree it points to) is still missing.

-otto


--
Otto A. Gygax ([EMAIL PROTECTED])
Digital Publishing Solutions, Software Development  
Hewlett-Packard, Corvallis, Oregon
ph: (541)715-9098 / fax: (541)715-4980 / cell: (541)602-3491


-Original Message-
From: Geoff Hutchison [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 04, 2000 7:30 PM
To: GYGAX,OTTO (HP-Corvallis,ex1)
Cc: '[EMAIL PROTECTED]'
Subject: Re: [htdig] puzzled by htdig


At 2:43 PM -0700 10/4/00, GYGAX,OTTO (HP-Corvallis,ex1) wrote:
>Now it won't work. htdig is able to look up other web pages that reside at the
>root of the web server but cannot traverse down to the ~arch tree.

There are a few points here and it is perhaps better to explain how 
htdig follows links rather than to directly address your question.

In the htdig.conf file, there are two key attributes for your question:
start_url: http://www.foo.com/
limit_urls_to: ${start_url}

As set, this would start indexing at www.foo.com and go from there. 
The limit_urls_to attribute requires that any URLs it finds match 
this pattern. In this case, this will limit indexing to everything 
inside this server. (You could, for example, just set it to "foo.com" 
to index all servers in that domain, etc.) But it will *only* follow 
links. So if you don't have a link from a file at the start_url to a 
certain file, it won't index it.

Your example is a little unclear to me. My guess is that you are 
either not using limit_urls_to correctly or you don't have working 
links to the files you're trying to index.

For more information:
http://www.htdig.org/attrs.html#start_url
http://www.htdig.org/attrs.html#limit_urls_to

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] header.html - fixed!

2000-10-05 Thread Steve Murray

Thanks to the group for their suggestions.

It turned out I had the path right, but I put it in the wrong place in htdig.conf!

I had it right under "database_dir:"

When I moved the path to the very bottom of the doc it worked. (Don't ask me why!)

Steve
___
Visit http://www.visto.com/info, your free web-based communications center.
Visto.com. Life on the Dot.



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Can't get my search to update correctly..

2000-10-05 Thread Gilles Detillieux

According to Rivera, Tony:
> However, that's not working...about 5 days ago I added a new
> directory /www/itss and have made numerous links to it from my index page
> and various other pages on the server and it is still not getting picked up
> when I do a search for it.

I assume these are HTML links and not JavaScript ones.  One possibility is
that the pages you modified are actually dynamic content (SSI, PHP, etc.)
and so the server isn't returning a Last-Modified header.  If this is the
case, htdig won't realize the pages have been modified.  You can set the
modification_time_is_now attribute to true, but then htdig will reindex
all dynamic pages every time it runs.

> I am not quite sure what I am missing here...I have read through all the
> archives on the site, ran /opt/www/htdig/bin/htdig -v -a -s and
> /opt/www/htdig/bin/htmerge -v -a -s but is still doesn't update.

I assume you know about how to maintain the .work files and non-.work
files before and after running htdig and htmerge with the -a option?
If you're not copying the updated .work files to their non-.work locations,
then htsearch won't see the updates.  See the contrib/examples/rundig.sh
script for an example of using the -a option for updates.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] server_aliases

2000-10-05 Thread Gilles Detillieux

According to Malcolm Austen:
> Thanks Gilles. I'm afraid I had not noticed the replicated server name in
> the example; I read it to deduce the syntax and then assumed that all the
> semantics would be in the text. I think it's worth noting this explicitly
> in the explanatory text.

Yes, I just added a note about that to the 3.2 documentation.  I think I'll
migrate it back to the current docs on the web site as well.

> + In 3.1.5, the port numbers are optional.  I'll make a note of that for
> + 3.2's documentation.
> 
> Thanks, I'll drop some of my port numbers then. My search of the archives
> found a message stating that the port numbers only become optional in
> version 3.2. Since some people around here like to run public services on
> non-standard port numbers, leaving them out may save me a few
> configuration lines in the future 8-).

The 3.2 development has been ongoing for close to two years now.
Last year, up to the 3.1.5 release in February, I was looking over at
the 3.2 development code and backporting fixes and features that migrated
easily to 3.1, so much of the changes since 3.1.2 have been things that
were supposedly only to come in 3.2.

However, you should note that if the port number is omitted, it defaults
to 80.  If you're using non-standard port numbers, you should specify them
explicitly.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] Can't get my search to update correctly..

2000-10-05 Thread Rivera, Tony

Hello Everyone,

I have a what should be a *simple* question.  I am having a hard time
getting htdig to update it's searching.  I have my crontab setup to run
htdig everynight so it will catch changes made to the web server throughout
the day.  However, that's not working...about 5 days ago I added a new
directory /www/itss and have made numerous links to it from my index page
and various other pages on the server and it is still not getting picked up
when I do a search for it.

I am not quite sure what I am missing here...I have read through all the
archives on the site, ran /opt/www/htdig/bin/htdig -v -a -s and
/opt/www/htdig/bin/htmerge -v -a -s but is still doesn't update.

Basically all I want is for htdig to go out and update every night so that
it will recognize any changes made to the web server earlier in the day.

Thanks in advance from a newbie!

Cheers,
Tony


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Registers when I index one file.

2000-10-05 Thread Geoff Hutchison

At 11:09 AM +0200 10/5/00, Jose Antonio Gómez wrote:
>I index this file with htdig, but when I search by one member, for
>example: "ALFREDO", then htdig returns me all file and I want it returns
>only its register:
>Usuario: ALFREDO EDUARDO ALVAREZ GARCIA  (Personal de Administración y
>Servicios )
>email: [EMAIL PROTECTED]
>
>How can I do that? Thanks very much.

You can't. If you have  tags in the file, it will go to 
the first match in the file, but you cannot break a file into parts.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Question about url_part_aliases

2000-10-05 Thread Geoff Hutchison

At 1:03 AM -0400 10/5/00, Ramon Gonzalez wrote:
>bypass the web server. When the user does a search, I need to return all
>URL's with https:// but this is not happening at all. And yes, I'm running
>htdig (with the -i option) and htmerge. Any help is appreciated!
>
>Below are my config entries for my test web server:
>
>local_urls_only:true
>local_urls: http://192.168.0.3/=/home/httpd/html/
>start_url:  http://192.168.0.3/ http://192.168.0.3/manual/misc
>url_part_aliases:   http:// *site
>url_part_aliases:   https:// *site

You misunderstood the documentation. You put one url_part_aliases 
attribute in the config file you use when indexing. Then you make 
another config file (which could just include the other one) and then 
put the *other* url_part_aliases attribute in that one. When you 
search, you specify the second one.

e.g.

htdig.conf:
...
url_part_aliases:   http:// *site

search.conf:
include: htdig.conf
url_part_aliases:   https:// *site

This serves to keep the HTTP URLs when indexing (since the indexer 
cannot index HTTPS) and serve up HTTPS when searching.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re[3]: [htdig] ASCIIfy patch <------- SOLVED

2000-10-05 Thread Andoni Ayala

Hi!

My problem already solved.
The cache seems to me the prevoius page that htdig generate.




El Thu, 05 Oct 2000 10:35:35 +0200
Andoni Ayala <[EMAIL PROTECTED]> escribiste:

> >
>
> Yes, i already aplied this patch
> and i run rundig with fuzzy accents algoritms, but when i go to search
> "jamón" it´s seems to me diferents search that  "jamon".
>
>
> 
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> List archives:  
> FAQ:
>
>



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] server_aliases

2000-10-05 Thread Malcolm Austen

On Wed, 4 Oct 2000, Gilles Detillieux wrote:

+ According to Malcolm Austen:
+ > server_aliases: aaa:80=bbb:80
+ > will it be "aaa" or "bbb" in the database?
+ 
+ The mapping goes left to right, so bbb would be the one used in
+ the database, and even when htdig fetches the documents.  I thought
+ the example in http://www.htdig.org/attrs.html#server_aliases made it
+ abundantly clear, as it contains two mappings to the same canonical name.
+ It wouldn't make sense to map one name to two different names, would it?

Thanks Gilles. I'm afraid I had not noticed the replicated server name in
the example; I read it to deduce the syntax and then assumed that all the
semantics would be in the text. I think it's worth noting this explicitly
in the explanatory text.

+ In 3.1.5, the port numbers are optional.  I'll make a note of that for
+ 3.2's documentation.

Thanks, I'll drop some of my port numbers then. My search of the archives
found a message stating that the port numbers only become optional in
version 3.2. Since some people around here like to run public services on
non-standard port numbers, leaving them out may save me a few
configuration lines in the future 8-).

regards,
Malcolm.
+
| Malcolm Austen,  Tel: +44(0) 1865 273216
| Oxford University Computing Services,Fax: +44(0) 1865 273275
| 13 Banbury Road,   Email -  [EMAIL PROTECTED]
| Oxford, OX2 6NN, England   WWW - http://users.ox.ac.uk/~malcolm/
+



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] Registers when I index one file.

2000-10-05 Thread Jose Antonio Gómez

Hello,

I have got one file with all members of mi institution, for example:


Usuario: ALFREDO EDUARDO ALVAREZ GARCIA  (Personal de Administración y
Servicios )
email: [EMAIL PROTECTED]
:
Usuario: ALFREDO FERNANDEZ HERNANZ (páginas.
Personal
de Administración y Servicios)
email: [EMAIL PROTECTED]
:
Usuario: ALICIA MIRON  (Personal de Administración y Servicios)
email: [EMAIL PROTECTED]
:



I index this file with htdig, but when I search by one member, for
example: "ALFREDO", then htdig returns me all file and I want it returns
only its register:
Usuario: ALFREDO EDUARDO ALVAREZ GARCIA  (Personal de Administración y
Servicios )
email: [EMAIL PROTECTED]

How can I do that? Thanks very much.
--
...
Jose Antonio Gómez   91- 336 65 68
Centro de Cálculo
Escuela Superior de Arquitectura
Universidad Politécnica de Madrid




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re[2]: [htdig] ASCIIfy patch

2000-10-05 Thread Andoni Ayala


El Wed, 4 Oct 2000 12:35:53 -0500 (CDT)
Gilles Detillieux <[EMAIL PROTECTED]> escribiste:

> >
> > I would like that when people seach cafe, the engine search café and cafe (for 
>example)
>
> There are a couple different patches for dealing with accents.  The one
> which I think is preferable adds an "accents" fuzzy match method to
> htsearch, but still retains the accents in the database.  This is also
> the method that was added to the 3.2.0b2 beta.
>
> ftp://ftp.ccsf.org/htdig-patches/3.1.5/accents.5
>

Yes, i already aplied this patch
and i run rundig with fuzzy accents algoritms, but when i go to search
"jamón" it´s seems to me diferents search that  "jamon".



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] ... but not changed

2000-10-05 Thread David Adams

> 
> According to Geoff Hutchison:
> > On Wed, 4 Oct 2000, David Adams wrote:
> > > It had not occured to me that an SSI file was "dynamic", I live and learn!
> > 
> > Yes, if you think about it for a second, you can realize that there's no
> > way for the server to be entirely sure of the modification date for SSI
> > files. It *could* send the date of the file itself, but what if it
> > includes a file that has changed, or an actual CGI?
> 
> Well, if the server were REALLY smart about it, it could keep track of
> the most recently modified include file or main file, and use that as
> the last modified date.  It would only need to suppress the header if
> CGI output is included in the mix.  Of course, the latter case would
> probably account for about 90% of SSI usage.  :)
> 
> -- 
> Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
> Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930
> 

We certainly don't allow CGI output in SSI on our server, but I've no way
of knowing if that is unusual.

-- 
 
David Adams
<[EMAIL PROTECTED]>
Computing Services
University of Southampton


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ: