On Sat, Dec 12, 2009 at 2:56 AM, kalyan <[email protected]> wrote:
> I'm doing one module in my site, there I need to import user blog into > my site. I can use RSS feeds to read the blog information but using > RSS feeds I'm not getting entire information. So, I need to scrape the > user blog page. How to scrape a pages without knowing its html > structure of a page? Unless you want the entire page, you need to know something about the page structure. Well. If the page is even reasonably marked up (DIVs/Ps-wise) and you create an array of block elements, you *might* get away with the assumption that the ones with significant amounts of text (for some value of "significant") are the actual blog post. Might. I'd imagine a lot more going into that heuristic, since you're looking for an AI solution :-) Good luck, -- Hassan Schroeder ------------------------ [email protected] twitter: @hassan -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

