Re: [htdig] different search results

2000-11-20 Thread David Adams

Gilles R. Detillieux wrote:
 
 According to gkalter:
  Hope this mailing-list is the right one..;-)
  
  Today I got htdig to work pretty well on a site containing many
  PDF-Files.
  
  • Cobalt Raq2 micorserver (mips) with RedHat based Linux
  
  After updating the C++ Compiler (see mailing list) I got rid of the
  segmenatition
  error messages and htdig worked well.
  
  Cryptic outputs of the search form were solved by adding a ".cgi"
  extension to htsearch
  in the local cgi-bin folder. Solution also found in the list - thanks to
  all those helpful people!
 
 I think the FAQ also has some pointers on getting the CGI to work.
 
  Because I wanted to get direct links to single PDF Pages out of the
  found excerpts I got
  the pdftodig.py script for external parsing of PDF-Files. (Do I have to
  mention that python
  IS NOT installed on Cobalt Raqs?) O.K. this problem could also be
  solved.
 
 It would also be a fairly trivial change to the perl scripts conv_doc.pl
 or doc2html.pl to make it replace form feeds in pdftotext output with
 the correct HTML a name="..." tags for the anchors.  You'd then be
 using an external converter, rather than an external parser, and possibly
 avoiding parser-related problems.
 
  Now everything works pretty good with one little exception.
  
  Using a complete search string e.g. "Sensor" lists all matching
  documents and the text contains
  the search word (bold typeface) with a link to the specific single Page
  of the found PDF file.
  (Great!)

I think I may be missing something here, perhaps somebody can explain
for me.  Am I right in thinking that the whole and only point of this is
to produce, in the lists produced by htsearch, excerpts from the first
page of .PDF documents containing a search word?

Or does one really get a link which when followed brings up the .PDF
document open at the relevant page?  If so, that would be quite something,
especially if it worked for a range of browsers.  What would be the correct
HTML a name="..." tags for the anchors?


-- 
 
David Adams
[EMAIL PROTECTED]
Computing Services
University of Southampton


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] different search results

2000-11-20 Thread Geoff Hutchison

On Mon, 20 Nov 2000, David Adams wrote:

 Or does one really get a link which when followed brings up the .PDF
 document open at the relevant page?  If so, that would be quite something,
 especially if it worked for a range of browsers.  What would be the correct
 HTML a name="..." tags for the anchors?

This is on the right track. Basically, you can pass along information to
Acrobat to open to a particular page. So AFAIK, it works with all browsers
that support the Acrobat PDF plugin.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Question about search engine

2000-11-20 Thread Dmitry Lesov

Dear Sir/Madam
I have a specific question about the search engine. I am looking for the

search engine smart enough to target HTML pages back into original
frame set. Does your search engine have this capability? Please
reply ASAP.Thank You
Sincerely,
Dmitry Lesov
(416)323-1981
A.K.A. New Media Inc.
www.akanewmedia.com






To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Question about search engine

2000-11-20 Thread Doug Barton

Dmitry Lesov wrote:
 
 Dear Sir/Madam
 I have a specific question about the search engine. I am looking for the
 
 search engine smart enough to target HTML pages back into original
 frame set. Does your search engine have this capability? Please
 reply ASAP.Thank You

No search engine I know of does, it's actually a very difficult problem to
solve. However, there are some javascript tricks you can use. Take a look
at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html

Good luck,

Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] different search results

2000-11-20 Thread Gilles Detillieux

According to Geoff Hutchison:
 On Mon, 20 Nov 2000, David Adams wrote:
  Or does one really get a link which when followed brings up the .PDF
  document open at the relevant page?  If so, that would be quite something,
  especially if it worked for a range of browsers.  What would be the correct
  HTML a name="..." tags for the anchors?
 
 This is on the right track. Basically, you can pass along information to
 Acrobat to open to a particular page. So AFAIK, it works with all browsers
 that support the Acrobat PDF plugin.

Pierre Olivier discussed the technique some months ago on this list, and
has a web page that describes it.  I forget the URL, but you'll find it
quickly with a Google.com search for "pdftodig".  There's also a little
script that implements the same capability in xpdf, for locating the right
page.  The technique involves using a cgi script URL in the anchor tag,
with the cgi script spitting out some XML for Acrobat.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Question about search engine

2000-11-20 Thread nets

At 11:41 AM -0800 11/20/2000, Doug Barton wrote:
Dmitry Lesov wrote:

  Dear Sir/Madam
  I have a specific question about the search engine. I am looking for the

  search engine smart enough to target HTML pages back into original
  frame set. Does your search engine have this capability? Please
  reply ASAP.Thank You

   No search engine I know of does, it's actually a very 
difficult problem to
solve. However, there are some javascript tricks you can use. Take a look
at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html

Actually, MondoSearch does this rather nicely 
http://www.mondosearch.com.  The drawback is that they have to 
reindex the entire site to do an update.

Avi


-- 
_
Complete Guide to Search Engines for Web Sites, Intranets, 
   and Portals: http://www.searchtools.com


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Question about search engine

2000-11-20 Thread Doug Barton

[EMAIL PROTECTED] wrote:
 
 At 11:41 AM -0800 11/20/2000, Doug Barton wrote:
 Dmitry Lesov wrote:
 
   Dear Sir/Madam
   I have a specific question about the search engine. I am looking for the
 
   search engine smart enough to target HTML pages back into original
   frame set. Does your search engine have this capability? Please
   reply ASAP.Thank You
 
No search engine I know of does, it's actually a very
 difficult problem to
 solve. However, there are some javascript tricks you can use. Take a look
 at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html
 
 Actually, MondoSearch does this rather nicely
 http://www.mondosearch.com.  The drawback is that they have to
 reindex the entire site to do an update.

I should have clarified that I meant "freely available, open source"
search engine. :)

Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] configuration problem

2000-11-20 Thread Morgan

Or so I think...  
I have a message board on my web site.  The index to the message board
prints out links to every single message ever posted on the message board
when the local host views the page(So that htdig can index all messages,
even those that are so old they are not normally seen).  
The links are of the form http://blah.com/messageboard/read.php?id=123456
Other things that have a "?" in the url index just fine  
Whatever message happens to appear at the top of the page gets indexed
just fine, but only that one.  The other thousands don't get touched by
htdig.  
Any ideas on what's wrong(With htdig or more likely me!:) ?
Thanks, 
Morgan






To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] configuration problem

2000-11-20 Thread Geoff Hutchison

At 8:39 PM -0600 11/20/00, Morgan wrote:
Whatever message happens to appear at the top of the page gets indexed
just fine, but only that one.  The other thousands don't get touched by
htdig.

Yes, you're correct that it's a configuration problem.

See http://www.htdig.org/attrs.html#max_doc_size and 
http://www.htdig.org/FAQ.html#5.1

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] configuration problem

2000-11-20 Thread Morgan

If it is a configuration problem(or a user error!:), it's not that one. 
I set the max_doc_size to an obscenely large number(Bigger than I already
had it).  I know htdig is getting there because it indexes the list of
messages and the first link on the page but quits after that.  I assume 9g
is large enough for max_doc_size? :) 
Any more ideas?  


On Mon, 20 Nov 2000, Geoff Hutchison wrote:

 At 8:39 PM -0600 11/20/00, Morgan wrote:
 Whatever message happens to appear at the top of the page gets indexed
 just fine, but only that one.  The other thousands don't get touched by
 htdig.
 
 Yes, you're correct that it's a configuration problem.
 
 See http://www.htdig.org/attrs.html#max_doc_size and 
 http://www.htdig.org/FAQ.html#5.1
 
 --
 -Geoff Hutchison
 Williams Students Online
 http://wso.williams.edu/
 
 
 To unsubscribe from the htdig mailing list, send a message to
 [EMAIL PROTECTED]
 You will receive a message to confirm this.
 List archives:  http://www.htdig.org/mail/menu.html
 FAQ:http://www.htdig.org/FAQ.html
 



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] configuration problem

2000-11-20 Thread Geoff Hutchison

At 9:56 PM -0600 11/20/00, Morgan wrote:
If it is a configuration problem(or a user error!:), it's not that one.
I set the max_doc_size to an obscenely large number(Bigger than I already
had it).  I know htdig is getting there because it indexes the list of
messages and the first link on the page but quits after that.  I assume 9g
is large enough for max_doc_size? :)

Well, I wouldn't suggest setting it larger than the amount of RAM you 
have (or at the very least, not larger than RAM+swap).

In any case I think I'd need to see an example page to have much more 
of an idea what's going on. I assume you've tried running htdig with 
say -vvv for debugging and taken a look at the output?

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html