I started writing something like Yahoo Pipes a few years ago with the idea
of making it easier for non-techies to use (click on dom elements in a
friendly GUI rather then the meta programming Pipes does).  You rapidly
realize that
1) you need to have a PhD in xpath, xslt and html-tidy (since almost
nothing out there is actually xhtml, and, pardon my lack of respect but
scraping via regx = worst idea ever)
2) scraping recipes are very fragile if you're building any kind of useable
data feed from a general website (assuming this is a website you're not in
charge of)
3) web sites change often enough that this is almost more trouble than it's
worth (this bias comes from doing a long term cubical sentence with the
evil LexisNexis group during the .com crash and data mining virtually all
info out there [mostly court records though] on the web)

If I were to do it again I'd have something like still-alive monitoring
websites I was working on as the q/a bit and figure out how to have it ping
something that "turned off" the spider->data-feed that's running elsewhere
(which could be done by simply sending an alert to the the spider app and
having it respond).  It's great at letting you know when the basic
structure changes and is as flexible as cucumber/capybara for that kind of
thing and a lot more polished (and does the alerts stuff).   For the data
I'd have my own thing running to convert spider-results into a atom/rss
feed.

I got my code working and was trying to do it as a startup but idiot
proofing this stuff to a product level that'd appeal to the general public
was unrealistic and, if it's not idiot proof then your target audience is
too small to make it worth the effort. (imo a big reason YahooPipes never
really took off - which I thought was a really cool idea).

-ben
(disclaimer - I work with reInteractive.net and Mikel and, occasionally, on
StillAlive).


On Thu, Dec 8, 2011 at 8:47 AM, Gregory McIntyre <[email protected]>wrote:

> Mikel, Of course! I am sorry if I oversimplified by calling it merely
> health monitoring. ;-)
>
> Lyndon, It's also an option but I wanted to shout out and see if
> somebody else already had a solution before I got stuck in.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby or Rails Oceania" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/rails-oceania?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rails-oceania?hl=en.

Reply via email to