Re: [SLUG] Invision phpBB Site Content ripping
Amos Shapira writes: > 2009/10/6 Kyle : >> Hi Folks, >> >> how hard/easy would it be to get something written which could log onto one >> IP.Board forum, crawl that site and download the content only, to import >> into another IP.board db? >> >> So users, forums, threads, PM's, user galleries, etc. >> >> Assuming one doesn't have access to the DB from the original site. > > We used Perl WWW::Mechanize (http://search.cpan.org/dist/WWW-Mechanize/) to > write up something similar to Forum Proxy Leacher. I'll try to get > permission to release it. If you do go down this path I ♥ the HTML::TreeParser::XPath module[1], which will parse the HTML into a DOM-like structure, then let you get at the content with XPath expressions. I find that extremely powerful in accessing the content in a meaningful fashion, either through the XPath queries, or through the TreeParser per-instance objects. Daniel Footnotes: [1] http://search.cpan.org/~mirod/HTML-TreeBuilder-XPath-0.11/ -- ✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707 ♽ made with 100 percent post-consumer electrons Looking for work? Love Perl? In Melbourne, Australia? We are hiring. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Invision phpBB Site Content ripping
2009/10/6 Kyle : > Hi Folks, > > how hard/easy would it be to get something written which could log onto one > IP.Board forum, crawl that site and download the content only, to import > into another IP.board db? > > So users, forums, threads, PM's, user galleries, etc. > > Assuming one doesn't have access to the DB from the original site. We used Perl WWW::Mechanize (http://search.cpan.org/dist/WWW-Mechanize/) to write up something similar to Forum Proxy Leacher. I'll try to get permission to release it. --Amos -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Invision phpBB Site Content ripping
You could probably scrape stuff using curl, sed etc and dump it into a file to import into a DB. But I am not aware of a specific app/script. 2009/10/6 Kyle : > Just for the record, it is a lost forum of which I have been a long time > contributor. And we now wish to migrate to a new setup. So nothing > diabolical. > > But it appears we may be out of luck? > > > Kind Regards > > Kyle > > > > Mark Walkom wrote: >> >> Well apart from the ethics of ripping off someones forums (unless >> they are yours that is) >> >> > > -- > SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ > Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html > -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Invision phpBB Site Content ripping
Just for the record, it is a lost forum of which I have been a long time contributor. And we now wish to migrate to a new setup. So nothing diabolical. But it appears we may be out of luck? Kind Regards Kyle Mark Walkom wrote: Well apart from the ethics of ripping off someones forums (unless they are yours that is) -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
Re: [SLUG] Invision phpBB Site Content ripping
Well apart from the ethics of ripping off someones forums (unless they are yours that is) I mod on a few of forums running IPB (2/3) and you can't get access to others PMs, even via the admin control panel. As far as I know you would need access directly into the database. 2009/10/6 Kyle : > Hi Folks, > > how hard/easy would it be to get something written which could log onto one > IP.Board forum, crawl that site and download the content only, to import > into another IP.board db? > > So users, forums, threads, PM's, user galleries, etc. > > Assuming one doesn't have access to the DB from the original site. > > -- > > Kind Regards > > Kyle > > -- > SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ > Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html > -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
[SLUG] Invision phpBB Site Content ripping
Hi Folks, how hard/easy would it be to get something written which could log onto one IP.Board forum, crawl that site and download the content only, to import into another IP.board db? So users, forums, threads, PM's, user galleries, etc. Assuming one doesn't have access to the DB from the original site. -- Kind Regards Kyle -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html