On Fri, Mar 15, 2013 at 8:12 AM, Jaap Karssenberg <jaap.karssenb...@gmail.com> wrote: > On Thu, Mar 14, 2013 at 5:39 PM, Michael Spranger > <mikeitsecur...@gmail.com> wrote: >> How much effort would it take to get that self contained HTML to import into >> zim? I am not a scripter so I am of no help there. > > I got some code to unpack the stand alone HTML, that part is easy. > Next step will be converting the HTML to text while preserving at > least images and bullet lists. Some other markup can be preserved, but > most may get lost. Tables will end up as lines of text. > > One limitation I see at the moment for the OneNote importer is that > when I export a section from OneNote I get multiple pages in a single > HTML file. Unfortunately the start of a new page is not clearly marked > in the HTML, so splitting up in multiple pages will not be very > robust.
OK, I also found some code I hacked some time ago to import fragments of HTML. Will have to put the two together to get a real solution. What I need at this point to proceed is some test data: * .mht export of a notebook section containing multiple pages * include some images * include some bullet lists * include headings and sub-headings (level 1 / 2 /.. ) * use bold / italic / ... * include some bullet lists Please make sure that such test data is not private and copyright free, so I can add it to zim's test suite eventually. Try make it look like realistic notes, that makes it easier to check if result looks good as well. (So far I have been using an export of OneNote's welcome pages, good example data but all copyrighted by Microsoft.) Given good test data I can probably have a working import function in a week or two. Regards, Jaap _______________________________________________ Mailing list: https://launchpad.net/~zim-wiki Post to : firstname.lastname@example.org Unsubscribe : https://launchpad.net/~zim-wiki More help : https://help.launchpad.net/ListHelp