Re: [SLUG] Invision phpBB Site Content ripping

2009-10-06 Thread Daniel Pittman
Amos Shapira  writes:
> 2009/10/6 Kyle :
>> Hi Folks,
>>
>> how hard/easy would it be to get something written which could log onto one
>> IP.Board forum, crawl that site and download the content only, to import
>> into another IP.board db?
>>
>> So users, forums, threads, PM's, user galleries, etc.
>>
>> Assuming one doesn't have access to the DB from the original site.
>
> We used Perl WWW::Mechanize (http://search.cpan.org/dist/WWW-Mechanize/) to
> write up something similar to Forum Proxy Leacher. I'll try to get
> permission to release it.

If you do go down this path I ♥ the HTML::TreeParser::XPath module[1], which
will parse the HTML into a DOM-like structure, then let you get at the content
with XPath expressions.

I find that extremely powerful in accessing the content in a meaningful
fashion, either through the XPath queries, or through the TreeParser
per-instance objects.

Daniel

Footnotes: 
[1]  http://search.cpan.org/~mirod/HTML-TreeBuilder-XPath-0.11/

-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
   Looking for work?  Love Perl?  In Melbourne, Australia?  We are hiring.
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Invision phpBB Site Content ripping

2009-10-05 Thread Amos Shapira
2009/10/6 Kyle :
> Hi Folks,
>
> how hard/easy would it be to get something written which could log onto one
> IP.Board forum, crawl that site and download the content only, to import
> into another IP.board db?
>
> So users, forums, threads, PM's, user galleries, etc.
>
> Assuming one doesn't have access to the DB from the original site.

We used Perl WWW::Mechanize
(http://search.cpan.org/dist/WWW-Mechanize/) to write up something
similar to Forum Proxy Leacher. I'll try to get permission to release
it.

--Amos
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Invision phpBB Site Content ripping

2009-10-05 Thread Mark Walkom
You could probably scrape stuff using curl, sed etc and dump it into a
file to import into a DB.
But I am not aware of a specific app/script.

2009/10/6 Kyle :
> Just for the record, it is a lost forum of which I have been a long time
> contributor. And we now wish to migrate to a new setup. So nothing
> diabolical.
>
> But it appears we may be out of luck?
>
> 
> Kind Regards
>
> Kyle
>
>
>
> Mark Walkom wrote:
>>
>> Well apart from the ethics of ripping off someones forums (unless
>> they are yours that is)
>>
>>
>
> --
> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Invision phpBB Site Content ripping

2009-10-05 Thread Kyle
Just for the record, it is a lost forum of which I have been a long time 
contributor. And we now wish to migrate to a new setup. So nothing 
diabolical.


But it appears we may be out of luck?


Kind Regards

Kyle



Mark Walkom wrote:

Well apart from the ethics of ripping off someones forums (unless
they are yours that is)

  


--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Invision phpBB Site Content ripping

2009-10-05 Thread Mark Walkom
Well apart from the ethics of ripping off someones forums (unless
they are yours that is)

I mod on a few of forums running IPB (2/3) and you can't get access to
others PMs, even via the admin control panel.
As far as I know you would need access directly into the database.


2009/10/6 Kyle :
> Hi Folks,
>
> how hard/easy would it be to get something written which could log onto one
> IP.Board forum, crawl that site and download the content only, to import
> into another IP.board db?
>
> So users, forums, threads, PM's, user galleries, etc.
>
> Assuming one doesn't have access to the DB from the original site.
>
> --
> 
> Kind Regards
>
> Kyle
>
> --
> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] Invision phpBB Site Content ripping

2009-10-05 Thread Kyle

Hi Folks,

how hard/easy would it be to get something written which could log onto 
one IP.Board forum, crawl that site and download the content only, to 
import into another IP.board db?


So users, forums, threads, PM's, user galleries, etc.

Assuming one doesn't have access to the DB from the original site.

--

Kind Regards

Kyle

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html