On Fri, Mar 15, 2013 at 8:12 AM, Jaap Karssenberg
<jaap.karssenb...@gmail.com> wrote:
> On Thu, Mar 14, 2013 at 5:39 PM, Michael Spranger
> <mikeitsecur...@gmail.com> wrote:
>> How much effort would it take to get that self contained HTML to import into
>> zim?  I am not a scripter so I am of no help there.
>
> I got some code to unpack the stand alone HTML, that part is easy.
> Next step will be converting the HTML to text while preserving at
> least images and bullet lists. Some other markup can be preserved, but
> most may get lost. Tables will end up as lines of text.
>
> One limitation I see at the moment for the OneNote importer is that
> when I export a section from OneNote I get multiple pages in a single
> HTML file. Unfortunately the start of a new page is not clearly marked
> in the HTML, so splitting up in multiple pages will not be very
> robust.

OK, I also found some code I hacked some time ago to import fragments
of HTML. Will have to put the two together to get a real solution.

What I need at this point to proceed is some test data:
* .mht export of a notebook section containing multiple pages
* include some images
* include some bullet lists
* include headings and sub-headings (level 1 / 2 /.. )
* use bold / italic / ...
* include some bullet lists

Please make sure that such test data is not private and copyright
free, so I can add it to zim's test suite eventually. Try make it look
like realistic notes, that makes it easier to check if result looks
good as well. (So far I have been using an export of OneNote's welcome
pages, good example data but all copyrighted by Microsoft.)

Given good test data I can probably have a working import function in
a week or two.

Regards,

Jaap

_______________________________________________
Mailing list: https://launchpad.net/~zim-wiki
Post to     : zim-wiki@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zim-wiki
More help   : https://help.launchpad.net/ListHelp

Reply via email to