I often need to convert from HTML to t2t.
There is already https://txt2tags.googlecode.com/svn/trunk/extras/unhtml.vim
but the result is not always clean and it's not scriptable.
Pandoc (http://johnmacfarlane.net/pandoc/) can also convert from html to
several other formats, but I didn't manage to adapt it to txt2tags (it seems
complicated to compile so I didn't go furthen than a simple installation)
But I've just discovered this handy piece of software:
http://search.cpan.org/dist/HTML-WikiConverter/
It converts from html to some wiki formats, such as dokuwiki or mediawiki. The
good new is it's very easy to adapt to new syntax. For example part of the
definition file is like this:
b => { start => '**', end => '**' },
strong => { alias => 'b' },
i => { start => '//', end => '//' },
em => { alias => 'i' },
u => { start => '__', end => '__' },
I've create a txt2tags export:
https://textallion.googlecode.com/hg/contrib/HTML-WikiConverter-Txt2tags/lib/HTML/WikiConverter/Txt2tags.pm
Until it's clean enough to put on the txt2tags svn, I've put the (work in
progress) project there:
https://code.google.com/p/textallion/source/browse/contrib/HTML-WikiConverter-Txt2tags/
The archive is in: https://code.google.com/p/textallion/downloads/list
Once html2wiki is installed and this module as well, you can invoke it this way:
html2wiki --dialect Txt2tags file.html
You can even get remote files and convert them like this:
curl --silent http://theody.net/elements.html | html2wiki --dialect Txt2tags
> elements.t2t------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
txt2tags-list mailing list
https://lists.sourceforge.net/lists/listinfo/txt2tags-list