I stuck exactly on recursive algoritm. Can't find out how to build that recursive function
Вот как раз на этой рекурсивной функции я и застрял. Не могу допетрить как ее написать. вторник, 12 мая 2015 г., 20:09:51 UTC+10 пользователь Vladimir Gordeev написал: > > At which point you're get stuck? > > Simply GET index page, parse it via nokogiri, select <a> tags which you > interested in, extract urls from href attribute, do recursive GET on these > urls. > Each page type should have its own function that performs GET and parsing. > > If you have to fetch pretty huge amount of pages, then you need to store > your grabbing state somewhere in database. For example, keep separate table > for urls to be parsed. (url is a unique key), and mark rows a "to be > parsed" and "already parsed". Of course you need to normalize all urls, not > avoid duplicates in table. > > Да и мог бы спросить в ror2ru. > > On Tue, May 12, 2015 at 7:42 AM, Роман Ярыгин <[email protected] > <javascript:>> wrote: > >> Hello! >> >> I need to grab all site data with all tree structure. Every page have >> links to children pages. How to build site tree with Nokogiri? It must be >> recursive page visiting and scraping all directory links, but I can't >> recognize full algorhytm. How to do that? >> P.S. And I don't need to "Save all site on disk with HTTRack". Data will >> be processed and copied on the new version of redesigned original site. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Ruby on Rails: Talk" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/rubyonrails-talk/db39c272-d353-42be-ae09-4a09fcf4abca%40googlegroups.com >> >> <https://groups.google.com/d/msgid/rubyonrails-talk/db39c272-d353-42be-ae09-4a09fcf4abca%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/7c6ea0ab-a24f-4739-ada0-3241e97becb2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

