[htdig] perl interface.
Folks, I downloaded the htdig perl interface from sourceforge.net. I'm wondering if this module gives you any ability to setup your own calls to htdig and present the results as u like (ie, returns a list of urls and there ratings). If this doesn't do this could someone point me to a perl module that does, if it exists... Thanks for any ideas, perl hack, not c++ hack, newbie to htdig gary Gary Artim [EMAIL PROTECTED] 510.540.5071 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] C++
How can we allow the user to search for the word 'C++'? We are stumped and is it just something htdig is not capable of doing? Thanks = Bill Vick 972-612-8425 __ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] C++
On Tue, 5 Dec 2000, Bill Vick wrote: How can we allow the user to search for the word 'C++'? See http://www.htdig.org/attrs.html#extra_word_characters e.g. extra_word_characters: + -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Named characters in search output
According to Tamas Nagy: Hello, When using "rarr;" (right arrow) named character in the first part of HTML documents, htdig seems to generate "amp;rarr; romaacute;" in the preview of documents. It is a bit strange, maybe a bug, because this string should generates a right arrow... Cheers, Tamas PS: Config: HtDig 3.0.2b2, RedHat 7 I assume you mean 3.2.0b2. This is a known problem, which is fixed in the 3.2.0b3 development snapshots. See http://www.htdig.org/FAQ.html#q5.22 -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] How Can I use htdig to index two or more websites?
According to Sean Harris: How Can I use htdig to index two or more websites? Thank you for your help!:-) Just add all the URLs you want to the start_url attribute, and possibly adjust limit_urls_to if you want something less limiting than what you've put in start_urls. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Help me to Search using Chinese!!!!!!!
According to Sean Harris: Help me to Search using Chinese!!! I'm afraid the answer hasn't really changed from 2-1/2 weeks ago. ht://Dig only supports 8-bit character sets. See http://www.htdig.org/FAQ.html#q4.10 This topic has been discussed many times on the list, and there are still no volunteers to take on the huge amount of work it would require to adapt ht://Dig for full Unicode support, and to add in the word splitting algorithms needed for many Asian languages. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Do me a favor
According to ellenliu: I have downloaded the program of 'htdig-3.2ob2.tar' from your site. But I have trouble to run it on personal my computer. My computer has been installed 'Red Hat 6.2' , which kernel is 2.214. However, when I run '/configuer' ,on the 993 lines it calls 'config.sub' ,then the program exits along with the promotion 'can't run config.sub' . Would you do me a favor to tell me why this happened ,and the most important thing is how I can run it successfully? Moreover, when should the embedded database be compiled ,and how is it compiled? CONFIGUER of HARDWARE: CPU : Pentium processor 550 Hard disc: 20G Memery: 64M It would probably be helpful to see the full output from the ./configure program. This package has been successfully installed before on Red Hat systems (6.1, 6.2 and others), so I would think that the most likely problem is a missing component on your system. You may also want to try the most recent development snapshot of 3.2.0b3, instead of 3.2.0b2, as many known bugs in 3.2.0b2 have since been fixed. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] restrict values and htdig.conf
According to [EMAIL PROTECTED]: We have 3 htdig-searches in our website. there are 3 different databases that are indexed: ../htdig/db/database1 indexed with ../conf/htdig1.conf ../htdig/db/database2 indexed with ../conf/htdig2.conf ../htdig/db/database3 indexed with ../conf/htdig3.conf the database3 includes all sites while the other 2 databases contains only parts of the whole. Now i want to expand the html-form with a select-option as follows: select name="restrict" option value="" selected.. Database1/OPTION option value="http://www.../"on Database2 option value="http://www.../"on Database3 /OPTION /SELECT o.k.! but how can i use this restrict-value in my htdig.conf? According to the selection in the html-form i must call the right htsearch with the right database! You seem to be confusing two alternate methods of restricting search results. You use the restrict parameter on htsearch only when searching a database that contains everything, in order to restrict the results to a subset of that database, i.e. only the URLs that match a particular pattern. If you want the user to select separate databases, then you should leave the restrict input parameter as an empty string, and have the user select the value of the "config" input parameter, which should be one of htdig1, htdig2 or htdig 3, i.e. the three configuration files you mentioned above with the directory and .conf file name extension stripped off. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] Indexing never ends ...
I am trying to index a directory containing php scripts that generates dynamic webpages from records in mySQL database. The script generate html pages that contain categories/subcategories from different states and towns, and called with something like this: category.phtml?catcode=ACCsubcatcode=ACC-DEVstatecode=STATE1TOWN=T1 I ran the indexing at 11pm last nite and it's still not finish at 8am this morning. There are only 20 categories in the category table, 120 subcategories in the subcategory table, 15 states in the state table and 152 towns in the town table. The console output seems that htdig runs in an infinite loop. I wonder why. As a clue, I suspected that the FOOTER in category.phtml is giving the problem. The footer consisted of all the states with hyperlink that points to: category.phtml?catcode=ACCsubcatcode=ACC-DEVstatecode=STATE1 for STATE1 category.phtml?catcode=ACCsubcatcode=ACC-DEVstatecode=STATE2 for STATE2 so on and so forth: AND of course the footer will (dynamically) CHANGE) when different category/subcategory are chosen: category.phtml?catcode=BUSsubcatcode=BUS-AUTstatecode=STATE1 for STATE1 category.phtml?catcode=BUSsubcatcode=BUS-AUTstatecode=STATE2 for STATE2 so on and so I have tried to refer to the RTFM, but it's getting me no where. I hope my explaination is clear and precise. Appreciate your kind advise. best regards, Zon To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Indexing never ends ...
On Wed, 6 Dec 2000, Zon Hisham Bin Zainal Abidin wrote: I ran the indexing at 11pm last nite and it's still not finish at 8am this morning. There are only 20 categories in the category table, 120 subcategories in the subcategory table, 15 states in the state table and 152 towns in the town table. Well, it's not clear if you can match these independently, but if you could this would be "only" 20*120*15*152 5,472,000 Which in my mind would take some time. Even just 120*15*152 gives 273,600 pages. To index the latter in 9 hours would require indexing an average of 30,400 pages in an hour or better than 8 pages a second. (!) AND of course the footer will (dynamically) CHANGE) when different category/subcategory are chosen: category.phtml?catcode=BUSsubcatcode=BUS-AUTstatecode=STATE1 for STATE1 category.phtml?catcode=BUSsubcatcode=BUS-AUTstatecode=STATE2 for STATE2 so on and so OK. But I don't see how this would necessarily lead to an infinite loop. If you see that the indexing is generating two URLs that lead to the same page, e.g.: category.phtml?catcode=BUSsubcatcode=BUS-AUTstatecode=STATE2 category.phtml?catcode=BUSstatecode=STATE2subcatcode=BUS-AUT To htdig, these are different, but these are probably the same to your code. But from your description, you haven't given any sense that this is happening, just that this seems to be taking longer than you expect. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] perl interface.
On Tue, 5 Dec 2000, Gary Artim wrote: setup your own calls to htdig and present the results as u like (ie, returns a list of urls and there ratings). If this doesn't do this could someone point me to a perl module that does, if it exists... As things stand now, the easiest way to do this is to write a Perl "wrapper." See for example contrib/ewswrap.cgi or others on the contributed section of the website: http://www.htdig.org/contrib/ Ideally there would be a Perl XS interface to the ht://Dig code, but unless someone steps forward to do that, it's not likely to be done anytime soon. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] Re: detailed information
Hi there, I'm assuming you picked my name as the contact for the ht://Dig search engine package. It is a UNIX search engine, but it is not based on Oracle. In most cases, if you're looking for a way to search an Oracle database, it's often better to hire an Oracle consultant to write a custom search package that is specifically addressed to your database schemas. If, on the other hand, you're looking for a general-purpose, open-source* web search package, feel free to browse the information on ht://Dig at: http://www.htdig.org/ *Specifically, ht://Dig is covered under the GNU GPL and is free software. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ On Tue, 5 Dec 2000, Pan, Belinda wrote: Hello, We are looking for a powerful search engine application based on UNIX platform and Oracle. please send us detailed information about your products. We would appreciate your response at your earliest convinence. Belinda Pan Sr. Webmaster [EMAIL PROTECTED] 416.538.7538 ext 270 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Re: detailed information
On Tue, 5 Dec 2000, Geoff Hutchison wrote: If, on the other hand, you're looking for a general-purpose, open-source* web search package, feel free to browse the information on ht://Dig at: http://www.htdig.org/ Sorry, I couldn't resist the urge to throw in some buzzwords. :-) -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] Re:
At 4:55 PM +0100 12/5/00, Roberta Minneci wrote: How do I restrict a search to word out script language="JavaScript" /script? See http://www.htdig.org/attrs.html#noindex_start -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] Pb indexing HTML with htdig 3.1.5
Hello, I use htdig 3.1.5 on a Red Hat Linux 5.0, and I want to index a new web site. But when I run rundig I get only one document. So to see what is doing, I use rundig -vvv and I get this output : Header line: HTTP/1.1 200 OK Header line: Server: Netscape-Enterprise/3.5.1C Header line: Date: Wed, 06 Dec 2000 07:32:02 GMT Header line: Content-type: text/html Header line: Last-modified: Mon, 15 Nov 1999 10:45:01 GMT Translated Mon, 15 Nov 1999 10:45:01 GMT to 1999-11-15 10:45:01 (99) And converted to Mon, 15 Nov 1999 10:45:01 Header line: Content-length: 1258 Header line: Accept-ranges: bytes Header line: Connection: close Header line: returnStatus = 0 Read 1258 from document Read a total of 1258 bytes Tag: html, matched -1 head: size = 1258 pick: x.y.z.t, # servers = 1 htdig: Run complete htdig: 1 server seen: htdig: x.y.z.t:8000 1 document I think that htdig doesn't like the HTML code "!--//" and "//--", and it see beginning of comment but not the end and ignore the rest of HTML code of the page. I am true ? An other idea ? What can I do ? N.B. : The HTML code of the first page on the site is under this line. _ html head titleAccueil DIRECTION/title base target="rtop" script language="JavaScript" !--// var url=""; var nom=""; var bName=""; function Ouvrir() { bName = navigator.appName Version = navigator.appVersion Version = Version.substring(0,1) browserOK = ((Version = 2)) if (browserOK) { this.name="home"; msgWindow=window.open("actu/default2.htm","popupdpd","location=no,toolbar=no,status=no,directories=no,scrollbars=yes,width=400,height=450"); bName=navigator.appName; if (bName=="Netscape") msgWindow.focus(); } } Ouvrir() //-- /script /head frameset framespacing="0" border="false" frameborder="0" cols="155,*" frame name="gauche" scrolling="no" noresize target="haut_droite" src="defaulta.htm" marginwidth="0" marginheight="5" frameset rows="*,45" frame name="texte" target="bas_droite" src="defaultb.htm" scrolling="auto" marginwidth="0" marginheight="0" noresize frame name="bas" src="basac.htm" scrolling="no" marginwidth="7" marginheight="15" noresize /frameset noframes body pCette page utilise des cadres, mais votre navigateur ne les prend pas en charge./p /body /noframes /frameset /html To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html