Use whatever scripting language you're most comfortable with - there are DOM parsing libraries for just about any language.
I've used simple_html_dom for PHP, but nokogiri for Ruby is just as simple, and I'm sure you can find an equivalent in whatever your language of choice is. A good cloud-based site to get started is scraperwiki.com; they have tutorials and a sandbox that you can build your script in, and the scraper results are stored in an SQLite database you could easily just query from your iOS app. If your data is public, you can even host a script there that updates automatically at regular intervals, and since its cloud-based, you don't have to worry about memory allocation. (I have a 5000+ page site that I scrape monthly, and even though it takes the better part of a day to update, it just works with no problems.) Nathaniel Taintor, Designer/Developer *Golden Apples Design* http://goldenapplesdesign.com On Fri, Sep 16, 2011 at 12:49 PM, Michael Hayes <[email protected]> wrote: > ** > I should probably clarify this better. The pages will be used in a > UIWebView on iOS. The documents will be split into sections (as they're > indicated in the documents) and used to populated a UITableView that will > drill down into individual sections. > > The reason that I want to do a script instead of manually, is that there > are 20 documents with up to 25 sections each, and we plan to convert more > documents in the future. > > > On 9/16/11 2:08 PM, Arp Laszlo wrote: > > What do you want to do with the pages when all is said & done? Will they > be updated frequently? I would probably build it out on WordPress. > > Arp Laszlo > > www.echoleaf.com > > > > On Fri, Sep 16, 2011 at 12:53 PM, Michael Hayes <[email protected]>wrote: > >> I have some html pages that need be cut up into individual pages with a >> new header and some sparse formatting. I just have no idea where to start >> looking. >> >> What scripting language? What commands? If anybody has any thoughts on >> where I should start looking, I would be grateful. >> >> Thanks, >> Michael >> >> -- >> Michael Hayes >> http://mhayesdesign.com >> >> >> -- >> Our Web site: http://www.RefreshAustin.org/ >> >> You received this message because you are subscribed to the Google Groups >> "Refresh Austin" group. >> >> [ Posting ] >> To post to this group, send email to [email protected] >> Job-related postings should follow http://tr.im/refreshaustinjobspolicy >> We do not accept job posts from recruiters. >> >> [ Unsubscribe ] >> To unsubscribe from this group, send email to >> [email protected] >> >> [ More Info ] >> For more options, visit this group at >> http://groups.google.com/group/Refresh-Austin >> > > -- > Our Web site: http://www.RefreshAustin.org/ > > You received this message because you are subscribed to the Google Groups > "Refresh Austin" group. > > [ Posting ] > To post to this group, send email to [email protected] > Job-related postings should follow http://tr.im/refreshaustinjobspolicy > We do not accept job posts from recruiters. > > [ Unsubscribe ] > To unsubscribe from this group, send email to > [email protected] > > [ More Info ] > For more options, visit this group at > http://groups.google.com/group/Refresh-Austin > > > > -- > Michael Hayeshttp://mhayesdesign.com > 512-300-7142 > > -- > Our Web site: http://www.RefreshAustin.org/ > > You received this message because you are subscribed to the Google Groups > "Refresh Austin" group. > > [ Posting ] > To post to this group, send email to [email protected] > Job-related postings should follow http://tr.im/refreshaustinjobspolicy > We do not accept job posts from recruiters. > > [ Unsubscribe ] > To unsubscribe from this group, send email to > [email protected] > > [ More Info ] > For more options, visit this group at > http://groups.google.com/group/Refresh-Austin > -- Our Web site: http://www.RefreshAustin.org/ You received this message because you are subscribed to the Google Groups "Refresh Austin" group. [ Posting ] To post to this group, send email to [email protected] Job-related postings should follow http://tr.im/refreshaustinjobspolicy We do not accept job posts from recruiters. [ Unsubscribe ] To unsubscribe from this group, send email to [email protected] [ More Info ] For more options, visit this group at http://groups.google.com/group/Refresh-Austin
