I started writing something like Yahoo Pipes a few years ago with the idea of making it easier for non-techies to use (click on dom elements in a friendly GUI rather then the meta programming Pipes does). You rapidly realize that 1) you need to have a PhD in xpath, xslt and html-tidy (since almost nothing out there is actually xhtml, and, pardon my lack of respect but scraping via regx = worst idea ever) 2) scraping recipes are very fragile if you're building any kind of useable data feed from a general website (assuming this is a website you're not in charge of) 3) web sites change often enough that this is almost more trouble than it's worth (this bias comes from doing a long term cubical sentence with the evil LexisNexis group during the .com crash and data mining virtually all info out there [mostly court records though] on the web)
If I were to do it again I'd have something like still-alive monitoring websites I was working on as the q/a bit and figure out how to have it ping something that "turned off" the spider->data-feed that's running elsewhere (which could be done by simply sending an alert to the the spider app and having it respond). It's great at letting you know when the basic structure changes and is as flexible as cucumber/capybara for that kind of thing and a lot more polished (and does the alerts stuff). For the data I'd have my own thing running to convert spider-results into a atom/rss feed. I got my code working and was trying to do it as a startup but idiot proofing this stuff to a product level that'd appeal to the general public was unrealistic and, if it's not idiot proof then your target audience is too small to make it worth the effort. (imo a big reason YahooPipes never really took off - which I thought was a really cool idea). -ben (disclaimer - I work with reInteractive.net and Mikel and, occasionally, on StillAlive). On Thu, Dec 8, 2011 at 8:47 AM, Gregory McIntyre <[email protected]>wrote: > Mikel, Of course! I am sorry if I oversimplified by calling it merely > health monitoring. ;-) > > Lyndon, It's also an option but I wanted to shout out and see if > somebody else already had a solution before I got stuck in. > > -- > You received this message because you are subscribed to the Google Groups > "Ruby or Rails Oceania" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/rails-oceania?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rails-oceania?hl=en.
