Re: HTML and RTF: Very basic import and export strategy
On 15 May 2012 20:18, Wilfried wrote: > Or it's not the latest version? Current version is 2.0.1, see > http://sourceforge.net/projects/rtf2latex2e/ It is, actually. > How shall rtf2latex2e know that YOU want it THIS way? > The heading conversion above is default setting, but it can be changed. > In the subfolder ./pref there is a file r2l-map in which it is specified > how headings are to be converted. It _shouldn't_, but I'd expect an option to switch. Well, yet another TODO :) I've just found gnuhtml2latex (because of the strange name my eyes failed on searches), and it does provide an option to switch between numbered and numbered sections. > What are the rtf2latex2e calling parameters? > Maybe you should call rtf2latex2e with the option -p1, not higher, see > documentation. Yes, I have tried -p1. > That is a big difference. rtf2latex2e is aimed at Word's rtf output. > Rtf from OOo and LibreOffice is broken. Thanks, didn't know rtf was that complicated. A quick look inside an rtf file gave me the impression that it'd be pretty standard across all implementations as far as layout is concerned (formatting is another story). I've come to the conclusion that (x)html is a much better format to deal with for this (though the website of rtf2latexe mentions otherwise). Even though gnuhtml2latex seems to do an OK job, the output is riddled with silly characters everywhere. This >> http://www.textfixer.com/html/convert-word-to-html.php << does an excellent job. Would anyone know of a good commandline alternative (for Linux)? A good solution would be a doc2html and a docx2html, along with a html cleaner. I don't see any libraries for this aside from lxml's html clean method for python (the quality of which I don't know). -- GPG/PGP ID: C0711BF1
Re: HTML and RTF: Very basic import and export strategy
Rashif Ray Rahman wrote: > Either rtf2latexe does a very bad job, or I'm missing some tips on its usage. Or it's not the latest version? Current version is 2.0.1, see http://sourceforge.net/projects/rtf2latex2e/ > Heading 1 gets translated to Section* instead of Section, and it'd be > good if Title were mapped to Chapter and not left alone. What's more, > anything more than a Heading 3 gets no section at all. I believe up to > Heading 5 can be mapped with Paragraph and Subparagraph. How shall rtf2latex2e know that YOU want it THIS way? The heading conversion above is default setting, but it can be changed. In the subfolder ./pref there is a file r2l-map in which it is specified how headings are to be converted. > What's worse > is that there are plenty of forced spaces here, there and everywhere, > along with some other gibberish that I did not want LaTeX to give me. What are the rtf2latex2e calling parameters? Maybe you should call rtf2latex2e with the option -p1, not higher, see documentation. > When I typed the document(s) in Word or Writer, [...] That is a big difference. rtf2latex2e is aimed at Word's rtf output. Rtf from OOo and LibreOffice is broken. Hope that helps, -- Wilfried Hennings
Re: HTML and RTF: Very basic import and export strategy
On Mon, May 14, 2012 at 9:30 AM, Richard Heck wrote: > On 05/14/2012 10:26 AM, Nico Williams wrote: >> >> Richard, >> >> Does LyX XHTML output preserve enough LyX metadata to be suitable as >> an import format? > > You mean back into LyX? Yes. With XML formats becoming ubiquitous that seems like it'd be useful. Nico --
Re: HTML and RTF: Very basic import and export strategy
On 05/14/2012 10:26 AM, Nico Williams wrote: Richard, Does LyX XHTML output preserve enough LyX metadata to be suitable as an import format? You mean back into LyX? Richard
Re: HTML and RTF: Very basic import and export strategy
Richard, Does LyX XHTML output preserve enough LyX metadata to be suitable as an import format? Nico --
Re: HTML and RTF: Very basic import and export strategy
On 05/14/2012 08:36 AM, Rashif Ray Rahman wrote: Hi guys Either rtf2latexe does a very bad job, or I'm missing some tips on its usage. It looks to me as if this is under active development: http://sourceforge.net/tracker/?atid=374868&group_id=22324&func=browse so you could try reporting bugs there. What's even worse is that there appears to be no active html2latex project. I do not see it anywhere in my distribution (I'm using Linux) and I wonder whether there's any story to that. Anyway, even if there were, I'd have to resort to online 'cleanup' tools to paste my document and get some clean HTML markup. Neither Word nor Writer outputs anything useful, and I don't want to go through the hoop of Ms Word> Writer> LaTeX extension> TeX file with gibberish when my document in fact is dead simple. Writer does a much better job nowadays than it used to do, because the LaTeX output is more configurable. Try the "Ultra clean article" export, for example. (There's no need to involve Word in any way here.) Better yet, download the writer2latex binary from http://writer2latex.sourceforge.net/ and the PyODConverter from: https://github.com/mirkonasato/pyodconverter and you can do it all from the command line. E.g.: python DocumentConverter.py myfile.rtf myfile.odt w2l -clean myfile.odt Richard