Unfortunately, the com object doesn't work with witango.
The INVOKE methods work, so the TidyMemToMem method, works great. But
the other methods, like setting the options, don't. As you know, with
witango, coms are hit an miss. If you get it to work, let me know.
Also, the com was written in 2001, so uses a much older version of
Tidy. The visual c++ source for the com is there. But you would have
to rewrite the com, with current Tidy source, and probably, redo the
methods to make them compatible with witango.
The current version of Tidy, does have command line executables for
all platforms.
So I was able to get it working, but I had to write a special helper
for witango to get the options to tidy through environment vars, and
the results from STDOUT and get them back to witango. It does work
though, and fairly fast.
I am going to be employing this for a solution I am finishing this
week, for scraping values from websites on a special project.
You can see how it works on a simple test app on my test server.
http://admin.bigheadtech.com/tidytest.taf?site=http://
www.drudgereport.com/
You can change the site url, to test on diff sites. I chose the
above, cuz it is one of the ugliest sites I have ever seen, and code
is prob just as ugly.
I tested a bunch of sites, and got the xml back, but one didn't work,
I didn't check the error file.
On the above test, the options are to output well formed XML, but if
you look at the tidy options, it is pretty endless. Also, I specified
to add a declaration, and use latin1 encoding, because witango must
have latin1.
--
Robert Garcia
President - BigHead Technology
VP Application Development - eventpix.com
13653 West Park Dr
Magalia, Ca 95954
ph: 530.645.4040 x222 fax: 530.645.4040
[EMAIL PROTECTED] - [EMAIL PROTECTED]
http://bighead.net/ - http://eventpix.com/
On Nov 10, 2005, at 7:38 AM, Rick Sanders wrote:
Hello Robert,
Wow! Thank you! This is VERY helpful! I'm going to start writing
and trying this next week.
Cool how you found the COM object, but I'm also going to try your
REGEX idea as well.
Kind Regards,
Rick Sanders
----- Original Message ----- From: "Robert Garcia"
<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, November 10, 2005 3:04 AM
Subject: Re: Witango-Talk: Search Engine
Well thanks for your question, Rick, it got me on a path on
google, where I found what I was looking for, done for me.
http://perso.wanadoo.fr/ablavier/TidyCOM/
A com object, that uses tidy. So, if it works in witango, ask me
in a couple days, you can use <@url to get a string of html, and
pass it through tidycom, to get xml output, parse with witango
dom, and get your values. Should be simple and reliable.
--
Robert Garcia
President - BigHead Technology
VP Application Development - eventpix.com
13653 West Park Dr
Magalia, Ca 95954
ph: 530.645.4040 x222 fax: 530.645.4040
[EMAIL PROTECTED] - [EMAIL PROTECTED]
http://bighead.net/ - http://eventpix.com/
On Nov 9, 2005, at 9:57 PM, Rick Sanders wrote:
Hey John,
---------------------------[snip]------------------------------
Trying to build a search engine spider. I want to grab the
html file using the <@URL> tag, then omit everything but the
title, keywords, and description and throw it into the database.
---------------------------[snip]------------------------------
I just want to know if this is possible to do this solely with
WiTango. Basically, I'm in a battle with Microsoft Content
Management Server. See, MCMS doesn't have search capability
because Microsoft closed the database. So, I am grabbing the
MCMS postings using an XML control (CMS Rapid) and the posting
comes out in HTML. I want to take the HTML, parse the data I
need out of it, throw it in a database, and query it.
Mondo Search is $10,800.00 first year, and $1800.00 the second &
third year. There's no control to use Coveo with MCMS. So, I
want to build a custom search interface with WiTango.
Thanks!
Rick
----- Original Message ----- From: "John McGowan"
<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, November 09, 2005 11:39 AM
Subject: Re: Witango-Talk: Search Engine
Rick,
What's your question?
/John
Rick Sanders wrote:
Hey Bill,
Thanks for the link!
But, I'd still love to do this completely in WiTango.
Rick
----- Original Message ----- From: "William M Conlon"
<[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, November 08, 2005 8:08 PM
Subject: Re: Witango-Talk: Search Engine
I'm a broken record on this, but here goes:
http://www.swish-e.org has a very nice perl spider which will
do this for you (well, you'll have to write a perl calback
function to INSERT INTO (link, title, keywords, description).
But the nice thing about this is that it's already integrated
with an HMTL parser, to pull this out for you.
On Nov 8, 2005, at 4:48 PM, Rick Sanders wrote:
Hey Guys,
Trying to build a search engine spider. I want to grab the
html file using the <@URL> tag, then omit everything but the
title, keywords, and description and throw it into the database.
I know I can do this with other platforms, but would like to
do it with WiTango.
Rick Sanders
President
519-498-7994
www.webenergy-sw.com
Bill
William M. Conlon, P.E., Ph.D.
To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306
vox: 650.327.2175 (direct)
fax: 650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web: http://www.tothept.com
_________________________________________________________________
__ _____
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/
maillist.taf
__________________________________________________________________
__ ____
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/
maillist.taf
___________________________________________________________________
__ ___
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
____________________________________________________________________
__ __
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
_____________________________________________________________________
___
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
______________________________________________________________________
__
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf