Here is some more, I am interested in this, cuz we have some future
projects, and I have been pondering a way to use dom to parse html
better.
http://java-source.net/open-source/html-parsers
And another:
http://tidy.sourceforge.net/
Tidy has libs for all platforms. We are experimenting with using
tidy. so that you could take any html, and then pass into a XML
parser, even witango's.
So to get the title of an html doc, if local$xml was the dom, would
just be:
<@assign local$title <@elementvalue local$xml xpath="/html/head/title">
or
<@assign local$title <@elementvalue local$xml xpath="//title">
You would have to use a case insensitive xql query.
for now, we are still using regex.
--
Robert Garcia
President - BigHead Technology
VP Application Development - eventpix.com
13653 West Park Dr
Magalia, Ca 95954
ph: 530.645.4040 x222 fax: 530.645.4040
[EMAIL PROTECTED] - [EMAIL PROTECTED]
http://bighead.net/ - http://eventpix.com/
On Nov 9, 2005, at 9:57 PM, Rick Sanders wrote:
Hey John,
---------------------------[snip]------------------------------
Trying to build a search engine spider. I want to grab the
html file using the <@URL> tag, then omit everything but the
title, keywords, and description and throw it into the database.
---------------------------[snip]------------------------------
I just want to know if this is possible to do this solely with
WiTango. Basically, I'm in a battle with Microsoft Content
Management Server. See, MCMS doesn't have search capability because
Microsoft closed the database. So, I am grabbing the MCMS postings
using an XML control (CMS Rapid) and the posting comes out in HTML.
I want to take the HTML, parse the data I need out of it, throw it
in a database, and query it.
Mondo Search is $10,800.00 first year, and $1800.00 the second &
third year. There's no control to use Coveo with MCMS. So, I want
to build a custom search interface with WiTango.
Thanks!
Rick
----- Original Message ----- From: "John McGowan" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, November 09, 2005 11:39 AM
Subject: Re: Witango-Talk: Search Engine
Rick,
What's your question?
/John
Rick Sanders wrote:
Hey Bill,
Thanks for the link!
But, I'd still love to do this completely in WiTango.
Rick
----- Original Message ----- From: "William M Conlon"
<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, November 08, 2005 8:08 PM
Subject: Re: Witango-Talk: Search Engine
I'm a broken record on this, but here goes:
http://www.swish-e.org has a very nice perl spider which will do
this for you (well, you'll have to write a perl calback function
to INSERT INTO (link, title, keywords, description).
But the nice thing about this is that it's already integrated
with an HMTL parser, to pull this out for you.
On Nov 8, 2005, at 4:48 PM, Rick Sanders wrote:
Hey Guys,
Trying to build a search engine spider. I want to grab the
html file using the <@URL> tag, then omit everything but the
title, keywords, and description and throw it into the database.
I know I can do this with other platforms, but would like to do
it with WiTango.
Rick Sanders
President
519-498-7994
www.webenergy-sw.com
Bill
William M. Conlon, P.E., Ph.D.
To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306
vox: 650.327.2175 (direct)
fax: 650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web: http://www.tothept.com
___________________________________________________________________
_____
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
____________________________________________________________________
____
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
_____________________________________________________________________
___
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
______________________________________________________________________
__
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf