Use whatever scripting language you're most comfortable with - there are DOM
parsing libraries for just about any language.

I've used simple_html_dom for PHP, but nokogiri for Ruby is just as simple,
and I'm sure you can find an equivalent in whatever your language of choice
is.

A good cloud-based site to get started is scraperwiki.com; they have
tutorials and a sandbox that you can build your script in, and the scraper
results are stored in an SQLite database you could easily just query from
your iOS app. If your data is public, you can even host a script there that
updates automatically at regular intervals, and since its cloud-based, you
don't have to worry about memory allocation. (I have a 5000+ page site that
I scrape monthly, and even though it takes the better part of a day to
update, it just works with no problems.)


Nathaniel Taintor, Designer/Developer
*Golden Apples Design*
http://goldenapplesdesign.com




On Fri, Sep 16, 2011 at 12:49 PM, Michael Hayes <[email protected]> wrote:

> **
> I should probably clarify this better. The pages will be used in a
> UIWebView on iOS. The documents will be split into sections (as they're
> indicated in the documents) and used to populated a UITableView that will
> drill down into individual sections.
>
> The reason that I want to do a script instead of manually, is that there
> are 20 documents with up to 25 sections each, and we plan to convert more
> documents in the future.
>
>
> On 9/16/11 2:08 PM, Arp Laszlo wrote:
>
> What do you want to do with the pages when all is said & done?  Will they
> be updated frequently?  I would probably build it out on WordPress.
>
> Arp Laszlo
>
> www.echoleaf.com
>
>
>
> On Fri, Sep 16, 2011 at 12:53 PM, Michael Hayes <[email protected]>wrote:
>
>> I have some html pages that need be cut up into individual pages with a
>> new header and some sparse formatting. I just have no idea where to start
>> looking.
>>
>> What scripting language? What commands? If anybody has any thoughts on
>> where I should start looking, I would be grateful.
>>
>> Thanks,
>> Michael
>>
>> --
>> Michael Hayes
>> http://mhayesdesign.com
>>
>>
>> --
>> Our Web site: http://www.RefreshAustin.org/
>>
>> You received this message because you are subscribed to the Google Groups
>> "Refresh Austin" group.
>>
>> [ Posting ]
>> To post to this group, send email to [email protected]
>> Job-related postings should follow http://tr.im/refreshaustinjobspolicy
>> We do not accept job posts from recruiters.
>>
>> [ Unsubscribe ]
>> To unsubscribe from this group, send email to
>> [email protected]
>>
>> [ More Info ]
>> For more options, visit this group at
>> http://groups.google.com/group/Refresh-Austin
>>
>
> --
> Our Web site: http://www.RefreshAustin.org/
>
> You received this message because you are subscribed to the Google Groups
> "Refresh Austin" group.
>
> [ Posting ]
> To post to this group, send email to [email protected]
> Job-related postings should follow http://tr.im/refreshaustinjobspolicy
> We do not accept job posts from recruiters.
>
> [ Unsubscribe ]
> To unsubscribe from this group, send email to
> [email protected]
>
> [ More Info ]
> For more options, visit this group at
> http://groups.google.com/group/Refresh-Austin
>
>
>
> --
> Michael Hayeshttp://mhayesdesign.com
> 512-300-7142
>
>  --
> Our Web site: http://www.RefreshAustin.org/
>
> You received this message because you are subscribed to the Google Groups
> "Refresh Austin" group.
>
> [ Posting ]
> To post to this group, send email to [email protected]
> Job-related postings should follow http://tr.im/refreshaustinjobspolicy
> We do not accept job posts from recruiters.
>
> [ Unsubscribe ]
> To unsubscribe from this group, send email to
> [email protected]
>
> [ More Info ]
> For more options, visit this group at
> http://groups.google.com/group/Refresh-Austin
>

-- 
Our Web site: http://www.RefreshAustin.org/

You received this message because you are subscribed to the Google Groups 
"Refresh Austin" group.

[ Posting ]
To post to this group, send email to [email protected]
Job-related postings should follow http://tr.im/refreshaustinjobspolicy
We do not accept job posts from recruiters.

[ Unsubscribe ]
To unsubscribe from this group, send email to 
[email protected]

[ More Info ]
For more options, visit this group at 
http://groups.google.com/group/Refresh-Austin

Reply via email to