I stuck exactly on recursive algoritm. Can't find out how to build that 
recursive function

Вот как раз на этой рекурсивной функции я и застрял. Не могу допетрить как 
ее написать.

вторник, 12 мая 2015 г., 20:09:51 UTC+10 пользователь Vladimir Gordeev 
написал:
>
> At which point you're get stuck?
>
> Simply GET index page, parse it via nokogiri, select <a> tags which you 
> interested in, extract urls from href attribute, do recursive GET on these 
> urls.
> Each page type should have its own function that performs GET and parsing.
>
> If you have to fetch pretty huge amount of pages, then you need to store 
> your grabbing state somewhere in database. For example, keep separate table 
> for urls to be parsed. (url is a unique key), and mark rows a "to be 
> parsed" and "already parsed". Of course you need to normalize all urls, not 
> avoid duplicates in table.
>
> Да и мог бы спросить в ror2ru.
>
> On Tue, May 12, 2015 at 7:42 AM, Роман Ярыгин <[email protected] 
> <javascript:>> wrote:
>
>> Hello!
>>
>> I need to grab all site data with all tree structure. Every page have 
>> links to children pages. How to build site tree with Nokogiri? It must be 
>> recursive page visiting and scraping all directory links, but I can't 
>> recognize full algorhytm. How to do that?
>> P.S. And I don't need to "Save all site on disk with HTTRack". Data will 
>> be processed and copied on the new version of redesigned original site.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Ruby on Rails: Talk" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/rubyonrails-talk/db39c272-d353-42be-ae09-4a09fcf4abca%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/rubyonrails-talk/db39c272-d353-42be-ae09-4a09fcf4abca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/7c6ea0ab-a24f-4739-ada0-3241e97becb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to