Re: [htdig] different search results
Gilles R. Detillieux wrote: According to gkalter: Hope this mailing-list is the right one..;-) Today I got htdig to work pretty well on a site containing many PDF-Files. Cobalt Raq2 micorserver (mips) with RedHat based Linux After updating the C++ Compiler (see mailing list) I got rid of the segmenatition error messages and htdig worked well. Cryptic outputs of the search form were solved by adding a ".cgi" extension to htsearch in the local cgi-bin folder. Solution also found in the list - thanks to all those helpful people! I think the FAQ also has some pointers on getting the CGI to work. Because I wanted to get direct links to single PDF Pages out of the found excerpts I got the pdftodig.py script for external parsing of PDF-Files. (Do I have to mention that python IS NOT installed on Cobalt Raqs?) O.K. this problem could also be solved. It would also be a fairly trivial change to the perl scripts conv_doc.pl or doc2html.pl to make it replace form feeds in pdftotext output with the correct HTML a name="..." tags for the anchors. You'd then be using an external converter, rather than an external parser, and possibly avoiding parser-related problems. Now everything works pretty good with one little exception. Using a complete search string e.g. "Sensor" lists all matching documents and the text contains the search word (bold typeface) with a link to the specific single Page of the found PDF file. (Great!) I think I may be missing something here, perhaps somebody can explain for me. Am I right in thinking that the whole and only point of this is to produce, in the lists produced by htsearch, excerpts from the first page of .PDF documents containing a search word? Or does one really get a link which when followed brings up the .PDF document open at the relevant page? If so, that would be quite something, especially if it worked for a range of browsers. What would be the correct HTML a name="..." tags for the anchors? -- David Adams [EMAIL PROTECTED] Computing Services University of Southampton To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] different search results
On Mon, 20 Nov 2000, David Adams wrote: Or does one really get a link which when followed brings up the .PDF document open at the relevant page? If so, that would be quite something, especially if it worked for a range of browsers. What would be the correct HTML a name="..." tags for the anchors? This is on the right track. Basically, you can pass along information to Acrobat to open to a particular page. So AFAIK, it works with all browsers that support the Acrobat PDF plugin. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] Question about search engine
Dear Sir/Madam I have a specific question about the search engine. I am looking for the search engine smart enough to target HTML pages back into original frame set. Does your search engine have this capability? Please reply ASAP.Thank You Sincerely, Dmitry Lesov (416)323-1981 A.K.A. New Media Inc. www.akanewmedia.com To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Question about search engine
Dmitry Lesov wrote: Dear Sir/Madam I have a specific question about the search engine. I am looking for the search engine smart enough to target HTML pages back into original frame set. Does your search engine have this capability? Please reply ASAP.Thank You No search engine I know of does, it's actually a very difficult problem to solve. However, there are some javascript tricks you can use. Take a look at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html Good luck, Doug -- Any sufficiently advanced technology is indistinguishable from magic. -- Arthur C. Clarke Do YOU Yahoo!? To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] different search results
According to Geoff Hutchison: On Mon, 20 Nov 2000, David Adams wrote: Or does one really get a link which when followed brings up the .PDF document open at the relevant page? If so, that would be quite something, especially if it worked for a range of browsers. What would be the correct HTML a name="..." tags for the anchors? This is on the right track. Basically, you can pass along information to Acrobat to open to a particular page. So AFAIK, it works with all browsers that support the Acrobat PDF plugin. Pierre Olivier discussed the technique some months ago on this list, and has a web page that describes it. I forget the URL, but you'll find it quickly with a Google.com search for "pdftodig". There's also a little script that implements the same capability in xpdf, for locating the right page. The technique involves using a cgi script URL in the anchor tag, with the cgi script spitting out some XML for Acrobat. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Question about search engine
At 11:41 AM -0800 11/20/2000, Doug Barton wrote: Dmitry Lesov wrote: Dear Sir/Madam I have a specific question about the search engine. I am looking for the search engine smart enough to target HTML pages back into original frame set. Does your search engine have this capability? Please reply ASAP.Thank You No search engine I know of does, it's actually a very difficult problem to solve. However, there are some javascript tricks you can use. Take a look at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html Actually, MondoSearch does this rather nicely http://www.mondosearch.com. The drawback is that they have to reindex the entire site to do an update. Avi -- _ Complete Guide to Search Engines for Web Sites, Intranets, and Portals: http://www.searchtools.com To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Question about search engine
[EMAIL PROTECTED] wrote: At 11:41 AM -0800 11/20/2000, Doug Barton wrote: Dmitry Lesov wrote: Dear Sir/Madam I have a specific question about the search engine. I am looking for the search engine smart enough to target HTML pages back into original frame set. Does your search engine have this capability? Please reply ASAP.Thank You No search engine I know of does, it's actually a very difficult problem to solve. However, there are some javascript tricks you can use. Take a look at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html Actually, MondoSearch does this rather nicely http://www.mondosearch.com. The drawback is that they have to reindex the entire site to do an update. I should have clarified that I meant "freely available, open source" search engine. :) Doug -- Any sufficiently advanced technology is indistinguishable from magic. -- Arthur C. Clarke Do YOU Yahoo!? To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] configuration problem
Or so I think... I have a message board on my web site. The index to the message board prints out links to every single message ever posted on the message board when the local host views the page(So that htdig can index all messages, even those that are so old they are not normally seen). The links are of the form http://blah.com/messageboard/read.php?id=123456 Other things that have a "?" in the url index just fine Whatever message happens to appear at the top of the page gets indexed just fine, but only that one. The other thousands don't get touched by htdig. Any ideas on what's wrong(With htdig or more likely me!:) ? Thanks, Morgan To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] configuration problem
At 8:39 PM -0600 11/20/00, Morgan wrote: Whatever message happens to appear at the top of the page gets indexed just fine, but only that one. The other thousands don't get touched by htdig. Yes, you're correct that it's a configuration problem. See http://www.htdig.org/attrs.html#max_doc_size and http://www.htdig.org/FAQ.html#5.1 -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] configuration problem
If it is a configuration problem(or a user error!:), it's not that one. I set the max_doc_size to an obscenely large number(Bigger than I already had it). I know htdig is getting there because it indexes the list of messages and the first link on the page but quits after that. I assume 9g is large enough for max_doc_size? :) Any more ideas? On Mon, 20 Nov 2000, Geoff Hutchison wrote: At 8:39 PM -0600 11/20/00, Morgan wrote: Whatever message happens to appear at the top of the page gets indexed just fine, but only that one. The other thousands don't get touched by htdig. Yes, you're correct that it's a configuration problem. See http://www.htdig.org/attrs.html#max_doc_size and http://www.htdig.org/FAQ.html#5.1 -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] configuration problem
At 9:56 PM -0600 11/20/00, Morgan wrote: If it is a configuration problem(or a user error!:), it's not that one. I set the max_doc_size to an obscenely large number(Bigger than I already had it). I know htdig is getting there because it indexes the list of messages and the first link on the page but quits after that. I assume 9g is large enough for max_doc_size? :) Well, I wouldn't suggest setting it larger than the amount of RAM you have (or at the very least, not larger than RAM+swap). In any case I think I'd need to see an example page to have much more of an idea what's going on. I assume you've tried running htdig with say -vvv for debugging and taken a look at the output? -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html