subject:"\[htdig\] Words and files not being found or indexed"

Re: [htdig] Words and files not being found or indexed

2000-12-18 Thread Gilles Detillieux


According to crosstar:
> This is a message for Gilles or anyone who is "senior" enough with the
> program to answer.
> 
> I had written to Gilles, earlier, and he had said to post the questions here.

I've been away from work and e-mail since last Wednesday, so I didn't get
caught up on this thread until today.  You see, there's a reason why I
always redirect people to the list!  I'm rather glad I missed this thread,
actually, as the whole thing seems to have been an exercise in frustration.

>From the outset, I referred you to FAQ 5.25, but I didn't see any evidence
from this whole, very long thread of discussion that anyone had looked at
or followed the suggestions there.  Was the language used in that question
so indecypherable that no one could get anything from it?  I realize it's
written in technical language, but setting up a search engine correctly is
a pretty technical problem, so if you don't understand the basics of Unix
or Linux and how web servers work, you really should read up on that before
attempting something like this.

Anyway, if anyone can contribute suggestions as to how this FAQ entry can
be better written, I'd be glad to hear them.  If the problem was simply
that no one bothered to look at the FAQ, then why am I wasting my time
trying to update it?

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread Heriberto Cantu


Ok.

I supouse that your web server is run by Linux or Unix.
And that you login and get in Linux/Unix.
That be your account prompt.

$ _

And that you have access to write a file on the main directory of
your server.

Then you change to that directory with the command :
$cd /home/httpd/html

I supose that this is the main directory of your web server.

Now I going to create a html file that contain all files in all
directories down the main directory /home/httpd/html and I'm going
to name it all_links_of_my_web_site.html with the command.

$ find . -depth -print | awk '{ print ""$1""}' - >
all_links_of_my_web_site.html

That command must be typed in one line and the symbol $ is your prompt.

Check the file size and adjust htdig.conf max_doc_size to have a
greater value so it be read til the end.

Now you have a file that points to every file in your site. Now tell
htdig to index this file in the start_url.

start_url:  http://your.domain/all_links_of_my_web_site.html

Good Luck


Heriberto Cantu
http://www.elinux.com.mx
Monterrey, Mexico
Tel: (8)129-1121
Cel: 0448-256-8807


At 05:10 p.m. 15/12/00 +, you wrote:
>Trying to understand the last message from Heriberto:
>
>Are you saying that you can create a file which contains all URLs for the
site and
>thereby aid in indexing the site?
>
>What does $ find . -depth -print mean?  Is it supposed to be typed somewhere?
>If so, where?
>
>When you say "complete path to file" do you actually mean "files" (plural)?
>  
>Where would you enter or use it?  In the server?  Under a given
sub-directory?
>At the prompt on the browser?
>
>What does "awk" mean?  
>
>What do you mean by "pipe the output"?
>
>What do you mean by "print the link"?
>
>What is the significance of "$1?
>
>What is someone supposed to do with this?  Type it?  Insert it?
>If so, where?
>
>If you could be more specific, I'll try to follow.
>
>Thanks.
>
>At 04:49 PM 12/15/00 -0600, you wrote:
>>Maybe a better idea is to use find to create such file.
>>$ find . -depth -print
>>
>>And now you have the complete path to the file.
>>
>>You just need to pipe the output to awk and print the
>>link "$1
>>
>>Good Luck
>>
>>At 09:09 a.m. 15/12/00 +, you wrote:
>>>At 18:46 14/12/2000 -0500, Geoff Hutchison wrote:
You can list as many URLs as you want in the start_url attribute, or you
can also include a file into the htdig.conf. e.g.:

start_url: `/path/to/urls.txt`
>>>
>>>
>>>I guess this would be the way to do it, excuse me if I'm stating the
obvious.
>>>
>>>Go to your root directory (For your web docs) eg /news/archive
>>>ls -R > temp.file
>>>
>>>you'l get eg
>>>
>>>/news/archive:
>>>
>>>file1   file2   file3
>>>
>>>write a short script to parse temp.file
>>>
>>>find a line that ends in :
>>>strip the :
>>>
>>>write to urls.txt
>>>
>>>the line that ended in : (- the colon)/file1
>>>the line that ended in : (- the colon) /file2
>>>...
>>>till you find another line that end in :
>>>
>>>Actually i think there's a far easier way to do this
>>>in perl but I can't think of it off the top of my head.
>>>
>>>Maybe a Feature Request? - Ability to give a start directory
>>>and index the files in the directory tree? (Or was that another ht://
>>product)
>>>
>>> Dunk
>>>
>>>
>>>
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:
>>>
>>>
>>>
>>>To unsubscribe from the htdig mailing list, send a message to
>>>[EMAIL PROTECTED]
>>>You will receive a message to confirm this.
>>>List archives:  
>>>FAQ:
>>>
>>>
>>
>>Heriberto Cantu
>>http://www.elinux.com.mx
>>Monterrey, Mexico
>>Tel: (8)129-1121
>>Cel: 0448-256-8807
>>
>>
>>
>>
>>To unsubscribe from the htdig mailing list, send a message to
>>[EMAIL PROTECTED]
>>You will receive a message to confirm this.
>>List archives:  
>>FAQ: 
>
>-
>The Nationalist Movement
>PO Box 2000
>Learned MS 39154
>(601) 885-2288
>Clinic: http://www.nationalist.org/board/html/index.php
>Crosstarlist: http://www.nationalist.org/docs/resources/list.html
>E-mail: mailto:[EMAIL PROTECTED]
>Forum: http://www.nationalist.org/forum/index.php
>Home Page: http://www.nationalist.org
>ICQ: 5429992
>Newsgroup: alt.national
>Views not necessarily those of The Nationalist Movement
>© 2000 by The Nationalist Movement
>-
>
>END
>
>
>
>To unsubscribe from the htdi

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread crosstar


Trying to understand the last message from Heriberto:

Are you saying that you can create a file which contains all URLs for the site and
thereby aid in indexing the site?

What does $ find . -depth -print mean?  Is it supposed to be typed somewhere?
If so, where?

When you say "complete path to file" do you actually mean "files" (plural)?
  
Where would you enter or use it?  In the server?  Under a given sub-directory?
At the prompt on the browser?

What does "awk" mean?  

What do you mean by "pipe the output"?

What do you mean by "print the link"?

What is the significance of "$1?

What is someone supposed to do with this?  Type it?  Insert it?
If so, where?

If you could be more specific, I'll try to follow.

Thanks.

At 04:49 PM 12/15/00 -0600, you wrote:
>Maybe a better idea is to use find to create such file.
>$ find . -depth -print
>
>And now you have the complete path to the file.
>
>You just need to pipe the output to awk and print the
>link "$1
>
>Good Luck
>
>At 09:09 a.m. 15/12/00 +, you wrote:
>>At 18:46 14/12/2000 -0500, Geoff Hutchison wrote:
>>>You can list as many URLs as you want in the start_url attribute, or you
>>>can also include a file into the htdig.conf. e.g.:
>>>
>>>start_url: `/path/to/urls.txt`
>>
>>
>>I guess this would be the way to do it, excuse me if I'm stating the obvious.
>>
>>Go to your root directory (For your web docs) eg /news/archive
>>ls -R > temp.file
>>
>>you'l get eg
>>
>>/news/archive:
>>
>>file1   file2   file3
>>
>>write a short script to parse temp.file
>>
>>find a line that ends in :
>>strip the :
>>
>>write to urls.txt
>>
>>the line that ended in : (- the colon)/file1
>>the line that ended in : (- the colon) /file2
>>...
>>till you find another line that end in :
>>
>>Actually i think there's a far easier way to do this
>>in perl but I can't think of it off the top of my head.
>>
>>Maybe a Feature Request? - Ability to give a start directory
>>and index the files in the directory tree? (Or was that another ht://
>product)
>>
>> Dunk
>>
>>
>>
>>>--
>>>-Geoff Hutchison
>>>Williams Students Online
>>>http://wso.williams.edu/
>>>
>>>
>>>
>>>To unsubscribe from the htdig mailing list, send a message to
>>>[EMAIL PROTECTED]
>>>You will receive a message to confirm this.
>>>List archives:  
>>>FAQ:
>>
>>
>>
>>To unsubscribe from the htdig mailing list, send a message to
>>[EMAIL PROTECTED]
>>You will receive a message to confirm this.
>>List archives:  
>>FAQ:
>>
>>
>
>Heriberto Cantu
>http://www.elinux.com.mx
>Monterrey, Mexico
>Tel: (8)129-1121
>Cel: 0448-256-8807
>
>
>
>
>To unsubscribe from the htdig mailing list, send a message to
>[EMAIL PROTECTED]
>You will receive a message to confirm this.
>List archives:  
>FAQ: 

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread Heriberto Cantu


Maybe a better idea is to use find to create such file.
$ find . -depth -print

And now you have the complete path to the file.

You just need to pipe the output to awk and print the
link "$1

Good Luck

At 09:09 a.m. 15/12/00 +, you wrote:
>At 18:46 14/12/2000 -0500, Geoff Hutchison wrote:
>>You can list as many URLs as you want in the start_url attribute, or you
>>can also include a file into the htdig.conf. e.g.:
>>
>>start_url: `/path/to/urls.txt`
>
>
>I guess this would be the way to do it, excuse me if I'm stating the obvious.
>
>Go to your root directory (For your web docs) eg /news/archive
>ls -R > temp.file
>
>you'l get eg
>
>/news/archive:
>
>file1   file2   file3
>
>write a short script to parse temp.file
>
>find a line that ends in :
>strip the :
>
>write to urls.txt
>
>the line that ended in : (- the colon)/file1
>the line that ended in : (- the colon) /file2
>...
>till you find another line that end in :
>
>Actually i think there's a far easier way to do this
>in perl but I can't think of it off the top of my head.
>
>Maybe a Feature Request? - Ability to give a start directory
>and index the files in the directory tree? (Or was that another ht://
product)
>
> Dunk
>
>
>
>>--
>>-Geoff Hutchison
>>Williams Students Online
>>http://wso.williams.edu/
>>
>>
>>
>>To unsubscribe from the htdig mailing list, send a message to
>>[EMAIL PROTECTED]
>>You will receive a message to confirm this.
>>List archives:  
>>FAQ:
>
>
>
>To unsubscribe from the htdig mailing list, send a message to
>[EMAIL PROTECTED]
>You will receive a message to confirm this.
>List archives:  
>FAQ:
>
>

Heriberto Cantu
http://www.elinux.com.mx
Monterrey, Mexico
Tel: (8)129-1121
Cel: 0448-256-8807




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread crosstar

Hi, Dunk:

If I understand your suggestion, it seems to be what I was seeking (if
it is possible).

The limitation for htdig, as I perceive it, is that it presently searches by way
of links, under html.

In my case (which I didn't realize, at the time), my entire site is based on
java-applet links.  I had assumed (woe is me) that htdig indexed the files and
directory-structure based on what existed on the server.

Now, I realize that unless I have html links starting from top to bottom, I
will miss a lot of stuff which htdig will simply ignore.  With thousands of
files, this is a problem, for us.

I am told that it would take months to accomplish a rewrite, which would
be tantamount to a redesign of our entire site.  We currently are missing
about 40% of our page from being indexed and searched by htdig.

Our admins discussed "aesthetics" versus "functionality" and decided
to stay with the java-applet indexes and links, for now.  They said that
they looked better and, after all, not too many people use htdig now, anyhow,
on our site (although I would like that to change).

So, anything which would improve the search capability by allowing us to 
use our structure from the server, somehow (and not have to
rely on html links), would be welcome.  It's a great program, no doubt, 
even if not able to do all we might like, however.

Thanks. 

At 03:44 PM 12/15/00 +, you wrote:
>At 07:59 15/12/2000 -0600, Geoff Hutchison wrote:
>>At 9:09 AM + 12/15/00, Duncan Brannen wrote:
>>>Maybe a Feature Request? - Ability to give a start directory
>>>and index the files in the directory tree? (Or was that another ht:// product)
>>
>>You can't do this in HTTP unless the server spits up directory listings. It's simply 
>that htdig can't know more than the server spits up.
>
>I meant to do it via the file system rather than going through a web server, give it 
>a document root and traverse down not necessarily
>following hyperlinks but instead the directory structure.
>
>Dunk
>
>
>
>>On the other hand, in the 3.2 code, there's support for the file:// "protocol" and 
>in the 3.2.0b3 snapshots, it automatically generates directory listings.
>>
>>--
>>-Geoff Hutchison
>>Williams Students Online
>>http://wso.williams.edu/
>>
>>
>>To unsubscribe from the htdig mailing list, send a message to
>>[EMAIL PROTECTED]
>>You will receive a message to confirm this.
>>List archives:  
>>FAQ:
>
>
>
>To unsubscribe from the htdig mailing list, send a message to
>[EMAIL PROTECTED]
>You will receive a message to confirm this.
>List archives:  
>FAQ:

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread Duncan Brannen


At 07:59 15/12/2000 -0600, Geoff Hutchison wrote:
>At 9:09 AM + 12/15/00, Duncan Brannen wrote:
>>Maybe a Feature Request? - Ability to give a start directory
>>and index the files in the directory tree? (Or was that another ht:// 
>>product)
>
>You can't do this in HTTP unless the server spits up directory listings. 
>It's simply that htdig can't know more than the server spits up.

I meant to do it via the file system rather than going through a web 
server, give it a document root and traverse down not necessarily
following hyperlinks but instead the directory structure.

 Dunk



>On the other hand, in the 3.2 code, there's support for the file:// 
>"protocol" and in the 3.2.0b3 snapshots, it automatically generates 
>directory listings.
>
>--
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/
>
>
>To unsubscribe from the htdig mailing list, send a message to
>[EMAIL PROTECTED]
>You will receive a message to confirm this.
>List archives:  
>FAQ:
>



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread Geoff Hutchison

At 9:09 AM + 12/15/00, Duncan Brannen wrote:
>Maybe a Feature Request? - Ability to give a start directory
>and index the files in the directory tree? (Or was that another ht:// product)

You can't do this in HTTP unless the server spits up directory 
listings. It's simply that htdig can't know more than the server 
spits up.

On the other hand, in the 3.2 code, there's support for the file:// 
"protocol" and in the 3.2.0b3 snapshots, it automatically generates 
directory listings.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-15 Thread Duncan Brannen

At 18:46 14/12/2000 -0500, Geoff Hutchison wrote:
>You can list as many URLs as you want in the start_url attribute, or you
>can also include a file into the htdig.conf. e.g.:
>
>start_url: `/path/to/urls.txt`

I guess this would be the way to do it, excuse me if I'm stating the obvious.

Go to your root directory (For your web docs) eg /news/archive
ls -R > temp.file

you'l get eg

/news/archive:

file1   file2   file3

write a short script to parse temp.file

find a line that ends in :
strip the :

write to urls.txt

the line that ended in : (- the colon)/file1
the line that ended in : (- the colon) /file2
...
till you find another line that end in :

Actually i think there's a far easier way to do this
in perl but I can't think of it off the top of my head.

Maybe a Feature Request? - Ability to give a start directory
and index the files in the directory tree? (Or was that another ht:// product)

 Dunk

>--
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/
>
>
>
>To unsubscribe from the htdig mailing list, send a message to
>[EMAIL PROTECTED]
>You will receive a message to confirm this.
>List archives:  
>FAQ:

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Geoff Hutchison

On Thu, 14 Dec 2000, crosstar wrote:

> I have never run htmerge, so I thought maybe I would try
> that (possibly to solve the problem of my files not being indexed).

You must run htmerge to get usable databases. The rundig script runs
htdig, htmerge and moves files around (look at the file itself for more
details).

> However, I received the following error:
> 
> /var:  warning, user disk quota exceeded

My guess is that the /tmp directory is actually on /var. When merging,
temporary files are needed (quite a lot, actually). These files are put
into the directory specified by the TMPDIR environment variable, or /tmp
if TMPDIR is not defined.

The rundig script sets TMPDIR to the same directory as your databases
before running. Alternatively, you can point it to someplace where there
won't be any sort of quota.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Geoff Hutchison

On Thu, 14 Dec 2000, crosstar wrote:

> The last message I got from Fred indicated that htdig would not index
> and find files unless they are listed in some upper-level file which
> includes an  We do have some links by way of a java applet directory;  however, Fred
> indicates that that will not work in this application.

That won't work for any search engine--there's no way to know that that
particular applet is going to give links. (Beyond that, it won't work for
people who don't turn on Java.)

> I have tried listing the absolute URL path-statements to where these
> files are located, as a test (listing these in the htdig.conf file).  
> But this has not been successful.

You can list as many URLs as you want in the start_url attribute, or you
can also include a file into the htdig.conf. e.g.:

start_url: `/path/to/urls.txt`

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar


In the last message from Fred, I notice that he said
that he ran rundig and htmerge.

I have never run htmerge, so I thought maybe I would try
that (possibly to solve the problem of my files not being indexed).

However, I received the following error:

/var:  warning, user disk quota exceeded

Our administrator says that htdig may be trying to access
the /var directory where e-mail is stored (and which is excluded
from access).

He says that we are no where near exceeding any quotas, so he does not
understand the error.

So, should we attempt to place a restriction in htdig.conf?  Or something
else?  Or, do we not need to run htmerge at all, maybe?

Thanks.

 
-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar


This is a message for Gilles or anyone who is "senior" enough with the
program to answer.

I had written to Gilles, earlier, and he had said to post the questions here.

The last message I got from Fred indicated that htdig would not index and find
files unless they are listed in some upper-level file which includes
an mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 8:33 AM
To: [EMAIL PROTECTED]
Cc: Matthes, Fred
Subject: RE: [htdig] Words and files not being found or indexed


Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in

my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some
example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approxima

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar


Hi, Fred:

Would this be a possible solution to get all the files and text indexed?

What if I list all URL's down to the deepest level in the config file
at:

start_url:  

such as:

/news/archives/2000/dec/

Or, should I possibly add the index file to it, as well, in htdig.conf,
such as:

/news/archives/2000/dec/archives.html

I just am at wits end, at this point.

Thanks.

PS We have 25 years filed and thousands of files and hundreds of
sub-directories, so I am wondering if this is really going to work for us?



 

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in 
my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this qu

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Matthes, Fred

Well, I'm just digging into htdig to solve my own problem.  I seriously
doubt that any spider is going to discover a url by executing a java applet.
I think it has to in the form of mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 8:33 AM
To: [EMAIL PROTECTED]
Cc: Matthes, Fred
Subject: RE: [htdig] Words and files not being found or indexed

Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.

Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in

my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some
example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!

If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.

You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar


Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in 
my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in 
my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!

If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.

You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work.  I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have 
>a path of references from the start page, to any other page that you 
>wnat indexed.

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nat

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Matthes, Fred

If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.

You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work.  I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have 
>a path of references from the start page, to any other page that you 
>wnat indexed.

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:<http://www.htdig.org/FAQ.html>

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:<http://www.htdig.org/FAQ.html>

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Thanks, Dennis.

I'll make this quick (because I know you are busy).

At 01:02 PM 12/14/00 -0600, you wrote:

>The big question is how is a viewer ever going to get to those 3,000
>pages?  

The hope is that they can reach these pages by using a good keyword
search engine.  We previously used Webglimpse and it worked just fine.
All you did was type in any word and it brought up any text from anywhere
on the site.

>How do they know about them???  

You do not know about them.  That is why we need the search engine.

>Can they click there way to every page starting at your home page?  

Nope, they cannot.

We do have a java applet which they can click on which directs them to
about a dozen pages which give the viewer a general idea of the site.
But, of course, not the specific data they seek.  Our site is
like a large reference library.

If this program simply does not index all the files or
cannot be configuered to do so, just tell me.  I have already spent
14 hours on this, getting nowhere, and can go back to Webglimpse, if
need be.

>If the answer is yes, they should get indexed by htdig.  

The answer is no.

>If the answer is no, then how do you visitors ever find
>those pages ?

Many find them because they are picked up by various web search engines,
such as yahoo, alta vista, excite, etc.  or by other sites
which place links to them.  But, we need a good search engine
which will index all files on our page, to be more specific
to our page.  Htdig looked nice, at first.

If there is any way to use it and if it can be configured to
index all or files and pages, I'd like to do so.

I can send a file with our opening page, if that would help.

Thanks.

>If you have more questions, youll have to go back to the list.
>Hope I was some help.
>
>
>On Thu, 14 Dec 2000, crosstar wrote:
>
>> Thanks, Dennis:
>> 
>> Let me make this quick, then.
>> 
>> We have approximately 3,000 files on our site.
>> 
>> Are you saying that we must list all 3,000 by name, path and
>> directory on our starting page in order for them to be indexed?
>> 
>> If so, where and how on the "starting page?"  Would this refer
>> to our "index.html" on our site (which is our default home page)?
>> 
>> Or, do you mean to list them all in the htdig.conf file?
>> 
>> If I misunderstand, kindly advise me (at your convenience).
>> Sorry it is unclear, at this point.  But I've never seen another
>> search engine operate in this manner.  Usually, this is all done 
>> automatically.
>> 
>> Thanks, again.
>> 
>>  
>> 
>> t 12:44 PM 12/14/00 -0600, you wrote:
>> >I'm not sure Im making myself clear, and unfortunately, I am swamped
>> >with work.  I can't spend any more time on this right now.
>> >
>> >You don't have to have files called index.html, you just have to have 
>> >a path of references from the start page, to any other page that you 
>> >wnat indexed.
>> 
>> -
>> The Nationalist Movement
>> PO Box 2000
>> Learned MS 39154
>> (601) 885-2288
>> Clinic: http://www.nationalist.org/board/html/index.php
>> Crosstarlist: http://www.nationalist.org/docs/resources/list.html
>> E-mail: mailto:[EMAIL PROTECTED]
>> Forum: http://www.nationalist.org/forum/index.php
>> Home Page: http://www.nationalist.org
>> ICQ: 5429992
>> Newsgroup: alt.national
>> Views not necessarily those of The Nationalist Movement
>> © 2000 by The Nationalist Movement
>> -
>> 
>> END
>> 

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work.  I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have 
>a path of references from the start page, to any other page that you 
>wnat indexed.

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Hi, one additional point regarding "index" files, if I understand your point
correctly.

Are you saying that I must have files called "index.html" in
order for the program to index the files under a given subdirectory?

Here is a brief explanation.

In a sub-directory, at present, I have a name of the sub-directory,
itself, which serves as an index.

For instance, if the sub-directory is named "dogs" I have a file 
in it called "dogs.html" which then lists the various files in
that subdirectory, with links to them (such as "fleas.html," 
"doghouse.html," etc.).

Now, are you saying that I must change "dogs.html" to
"index.html" in order for htdig to index the site completely?
Or, am I misreading what you said?

This could be done, if necessary, I suppose, but remember, we have
thousands of files and hundreds of sub-directories.  The task would
be tantamount to a Florida recount!

Please advise.  Thanks.

OK, thanks, Dennis.  This explanation is a lot clearer.

I was under the impression that the program simply
indexed the entire site, minus any files or sub-directories
which were specifically excluded.

At least, that appeared to be the implication.

OK, well, I'm glad to know that I was wrong on this.

So, the question now is, how to make it operate to index and
find ALL files? (All of my files are html files, by the way).

Per your example, I do not have many links to other pages on
the main page.  In fact, most pages (there are thousands of
them) on my site do not have any links on them at all.

So, per your explanation, these files would not be indexed.  
It appears that many are indexed, however, at present,
somehow, which throws me off, a bit.

Anyhow...

My main (start-up) page is contained in an "index.html"
page, but that is the only one which has an "index.html" file.
The others just have names like you suggested, cat.html, dog.html, etc.
And, they are all under many, many subdirectories.

So... as to solution...

Are you saying that I should expressly list ALL of my subdirectories in 
htdig.conf at:

start_url:

I want to be sure on this because I have HUNDREDS of subdirectories.

If the answer is "yes," then I will proceedto list the entire directory and
sub-directory structure, there...

but...

One more point.

Can I get by by listing the MAIN sub-directories, or do I need to list
all sub- sub -sub directories (some of them going quite deep)?

Hope I'm getting close to a solution.

Thanks for the tips and patience.

At 10:34 AM 12/14/00 -0600, you wrote:
>On Thu, 14 Dec 2000, crosstar wrote:
>
>> Thanks for taking the time to reply.  :)
>> 
>> I might sound a bit dense (sorry), but bear with me, as I hope
>> to get this thing operable.
>> 
>> What do you mean by:  
>> 
>> "Path of links on pages?"
>> 
>> "Starting at the start page?"
>> 
>..
>
>> 
>> I am trying to understand your analysis, but could you,
>> perhaps, simply tell me what exactly to do (such as,
>> "type this," "cut and paste" that, or something practical
>> (rather than just theoretical).
>> 
>
>Well, I can't give you cut and paste steps, but let me try again ..
>
>If your start page is  http://www.nationalist.org/ and on that page
>is a link to say  http://www.nationalist.org/gerbils.htm, then those 2 
>pages get indexed.
>
>If the start page has a link to
>http://www.nationalist.org/pets, and the pets subdirectory has an
>index.html page which includes links to dogs.html and cats.html, then
>you will index  http://www.nationalist.org/pets/dogs.html and
>http://www.nationalist.org/pets/cats.html.
>
>BUT ! if in the pets subdirectory is a page called fish.html and there is
>no link to fish.html in the pets/index.html or the
>http://www.nationalist.org/, then fish.html will not be indexed, because
>htdig never saw it.
>
>In other words, htdig gets the start page, looks inside for links, get
>those pages, looks inside for links, get those pages, etc.
>Just because a page is accessable by your server is not enough.
>
>That's the best I can do..

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

OK, thanks, Dennis.  This explanation is a lot clearer.

I was under the impression that the program simply
indexed the entire site, minus any files or sub-directories
which were specifically excluded.

At least, that appeared to be the implication.

OK, well, I'm glad to know that I was wrong on this.

So, the question now is, how to make it operate to index and
find ALL files? (All of my files are html files, by the way).

Per your example, I do not have many links to other pages on
the main page.  In fact, most pages (there are thousands of
them) on my site do not have any links on them at all.

So, per your explanation, these files would not be indexed.  
It appears that many are indexed, however, at present,
somehow, which throws me off, a bit.

Anyhow...

My main (start-up) page is contained in an "index.html"
page, but that is the only one which has an "index.html" file.
The others just have names like you suggested, cat.html, dog.html, etc.
And, they are all under many, many subdirectories.

So... as to solution...

Are you saying that I should expressly list ALL of my subdirectories in 
htdig.conf at:

start_url:

I want to be sure on this because I have HUNDREDS of subdirectories.

If the answer is "yes," then I will proceedto list the entire directory and
sub-directory structure, there...

but...

One more point.

Can I get by by listing the MAIN sub-directories, or do I need to list
all sub- sub -sub directories (some of them going quite deep)?

Hope I'm getting close to a solution.

Thanks for the tips and patience.

At 10:34 AM 12/14/00 -0600, you wrote:
>On Thu, 14 Dec 2000, crosstar wrote:
>
>> Thanks for taking the time to reply.  :)
>> 
>> I might sound a bit dense (sorry), but bear with me, as I hope
>> to get this thing operable.
>> 
>> What do you mean by:  
>> 
>> "Path of links on pages?"
>> 
>> "Starting at the start page?"
>> 
>..
>
>> 
>> I am trying to understand your analysis, but could you,
>> perhaps, simply tell me what exactly to do (such as,
>> "type this," "cut and paste" that, or something practical
>> (rather than just theoretical).
>> 
>
>Well, I can't give you cut and paste steps, but let me try again ..
>
>If your start page is  http://www.nationalist.org/  and on that page
>is a link to say  http://www.nationalist.org/gerbils.htm, then those 2 
>pages get indexed.
>
>If the start page has a link to
>http://www.nationalist.org/pets, and the pets subdirectory has an
>index.html page which includes links to dogs.html and cats.html, then
>you will index  http://www.nationalist.org/pets/dogs.html  and
>http://www.nationalist.org/pets/cats.html.
>
>BUT ! if in the pets subdirectory is a page called fish.html and there is
>no link to fish.html in the pets/index.html or the
>http://www.nationalist.org/, then fish.html will not be indexed, because
>htdig never saw it.
>
>In other words, htdig gets the start page, looks inside for links, get
>those pages, looks inside for links, get those pages, etc.
>Just because a page is accessable by your server is not enough.
>
>That's the best I can do..

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread Dennis Director

On Thu, 14 Dec 2000, crosstar wrote:

> Thanks for taking the time to reply.  :)
> 
> I might sound a bit dense (sorry), but bear with me, as I hope
> to get this thing operable.
> 
> What do you mean by:  
> 
> "Path of links on pages?"
> 
> "Starting at the start page?"
> 
..

> 
> I am trying to understand your analysis, but could you,
> perhaps, simply tell me what exactly to do (such as,
> "type this," "cut and paste" that, or something practical
> (rather than just theoretical).
> 

Well, I can't give you cut and paste steps, but let me try again ..

If your start page is  http://www.nationalist.org/  and on that page
is a link to say  http://www.nationalist.org/gerbils.htm, then those 2 
pages get indexed.

If the start page has a link to
http://www.nationalist.org/pets, and the pets subdirectory has an
index.html page which includes links to dogs.html and cats.html, then
you will index  http://www.nationalist.org/pets/dogs.html  and
http://www.nationalist.org/pets/cats.html.

BUT ! if in the pets subdirectory is a page called fish.html and there is
no link to fish.html in the pets/index.html or the
http://www.nationalist.org/, then fish.html will not be indexed, because
htdig never saw it.

In other words, htdig gets the start page, looks inside for links, get
those pages, looks inside for links, get those pages, etc.
Just because a page is accessable by your server is not enough.

That's the best I can do..

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Thanks for taking the time to reply.  :)

I might sound a bit dense (sorry), but bear with me, as I hope
to get this thing operable.

What do you mean by:  

"Path of links on pages?"

"Starting at the start page?"

For example, in htdig.conf, at present, there is:

# This specifies the URL where the robot (htdig) will start.  You can specify# 
multiple URLs here.  Just separate them by some whitespace.# The example here will 
cause the ht://Dig homepage and related pages to be# indexed.# You could also index 
all the URLs in a file like so:# start_url:  
`${common_dir}/start.url`#start_url: http://www.nationalist.org/

Are you saying that this is incorrect?  Do I need to add
something here?

If so, what?

Or, are you stating that something must be added to each page
in the site?  There are thousadns of pages and this would be
a rather difficult proposition (hope that wouldn't be necessary).

If you are saying that something has to be added on each page,
what -- exactly -- needs to be added, please.

What do you mean by:

"leave the nav bar indexable, at least on the front page"

The reason I am in the dark is that about 40% of the site appears
to be indexing all right.  But the other 60% is lost.  

I am trying to understand your analysis, but could you,
perhaps, simply tell me what exactly to do (such as,
"type this," "cut and paste" that, or something practical
(rather than just theoretical).

As I said, I am not very technical.

Much appreciated.

At 09:50 AM 12/14/00 -0600, you wrote:
>On Wed, 13 Dec 2000, crosstar wrote:
>
>> I am not too technical, so I hope this sounds clear.
>> 
>> I have htdig installed.  But, although it works fine with no
>> errors, many files and words are being left out of the search and
>> indexing.
>
>I'm not any kind of htdig expert, but ..
>on my sites, I discovered, (the slow way) that htdig recurses down from
>the start page that you give it in the config file.  If there is not a
>path of links on pages, starting at the start page, that gets you to the
>subdirectory you mention, then those pages will not be fetched and
>indexed.  This happened to me when I set  tags around my
>navigation bar on all pages.  I needed to leave the nav bar indexable, at
>least on the front page, so that there were references to all sections.
>Hope this helps!

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread Dennis Director

On Wed, 13 Dec 2000, crosstar wrote:

> I am not too technical, so I hope this sounds clear.
> 
> I have htdig installed.  But, although it works fine with no
> errors, many files and words are being left out of the search and
> indexing.

I'm not any kind of htdig expert, but ..
on my sites, I discovered, (the slow way) that htdig recurses down from
the start page that you give it in the config file.  If there is not a
path of links on pages, starting at the start page, that gets you to the
subdirectory you mention, then those pages will not be fetched and
indexed.  This happened to me when I set  tags around my
navigation bar on all pages.  I needed to leave the nav bar indexable, at
least on the front page, so that there were references to all sections.
Hope this helps!

To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

[htdig] Words and files not being found or indexed

2000-12-13 Thread crosstar


I am not too technical, so I hope this sounds clear.

I have htdig installed.  But, although it works fine with no
errors, many files and words are being left out of the search and
indexing.

I have checked all of the relevant FAQ, but either do not
understand what I am to do or am falling short, in some other way.

In reply to my earlier message, I was told to check the output using
-vvv.  I did so and here is what I found.

For example, I have a subdirectory which contains 70 files,
in /news/archives/2000/.

7 of these files turn up listed in the output.  But, where are the other 63?
They are not there and there is no reference to them in the entire
output file.

So, I am stumped as to what to do now.

Any assistance appreciated.  

HQ
-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:

Re: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

RE: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

Re: [htdig] Words and files not being found or indexed

[htdig] Words and files not being found or indexed

25 matches

Site Navigation

Mail list logo

Footer information