Re: HTML and RTF: Very basic import and export strategy
On 15 May 2012 20:18, Wilfried wrote: > Or it's not the latest version? Current version is 2.0.1, see > http://sourceforge.net/projects/rtf2latex2e/ It is, actually. > How shall rtf2latex2e know that YOU want it THIS way? > The heading conversion above is default setting, but it can be changed. > In the subfolder ./pref there is a file r2l-map in which it is specified > how headings are to be converted. It _shouldn't_, but I'd expect an option to switch. Well, yet another TODO :) I've just found gnuhtml2latex (because of the strange name my eyes failed on searches), and it does provide an option to switch between numbered and numbered sections. > What are the rtf2latex2e calling parameters? > Maybe you should call rtf2latex2e with the option -p1, not higher, see > documentation. Yes, I have tried -p1. > That is a big difference. rtf2latex2e is aimed at Word's rtf output. > Rtf from OOo and LibreOffice is broken. Thanks, didn't know rtf was that complicated. A quick look inside an rtf file gave me the impression that it'd be pretty standard across all implementations as far as layout is concerned (formatting is another story). I've come to the conclusion that (x)html is a much better format to deal with for this (though the website of rtf2latexe mentions otherwise). Even though gnuhtml2latex seems to do an OK job, the output is riddled with silly characters everywhere. This >> http://www.textfixer.com/html/convert-word-to-html.php << does an excellent job. Would anyone know of a good commandline alternative (for Linux)? A good solution would be a doc2html and a docx2html, along with a html cleaner. I don't see any libraries for this aside from lxml's html clean method for python (the quality of which I don't know). -- GPG/PGP ID: C0711BF1
Re: HTML and RTF: Very basic import and export strategy
Rashif Ray Rahman wrote: > Either rtf2latexe does a very bad job, or I'm missing some tips on its usage. Or it's not the latest version? Current version is 2.0.1, see http://sourceforge.net/projects/rtf2latex2e/ > Heading 1 gets translated to Section* instead of Section, and it'd be > good if Title were mapped to Chapter and not left alone. What's more, > anything more than a Heading 3 gets no section at all. I believe up to > Heading 5 can be mapped with Paragraph and Subparagraph. How shall rtf2latex2e know that YOU want it THIS way? The heading conversion above is default setting, but it can be changed. In the subfolder ./pref there is a file r2l-map in which it is specified how headings are to be converted. > What's worse > is that there are plenty of forced spaces here, there and everywhere, > along with some other gibberish that I did not want LaTeX to give me. What are the rtf2latex2e calling parameters? Maybe you should call rtf2latex2e with the option -p1, not higher, see documentation. > When I typed the document(s) in Word or Writer, [...] That is a big difference. rtf2latex2e is aimed at Word's rtf output. Rtf from OOo and LibreOffice is broken. Hope that helps, -- Wilfried Hennings
Re: HTML and RTF: Very basic import and export strategy
On Mon, May 14, 2012 at 9:30 AM, Richard Heck wrote: > On 05/14/2012 10:26 AM, Nico Williams wrote: >> >> Richard, >> >> Does LyX XHTML output preserve enough LyX metadata to be suitable as >> an import format? > > You mean back into LyX? Yes. With XML formats becoming ubiquitous that seems like it'd be useful. Nico --
Re: HTML and RTF: Very basic import and export strategy
On 05/14/2012 10:26 AM, Nico Williams wrote: Richard, Does LyX XHTML output preserve enough LyX metadata to be suitable as an import format? You mean back into LyX? Richard
Re: HTML and RTF: Very basic import and export strategy
Richard, Does LyX XHTML output preserve enough LyX metadata to be suitable as an import format? Nico --
Re: HTML and RTF: Very basic import and export strategy
On 05/14/2012 08:36 AM, Rashif Ray Rahman wrote: Hi guys Either rtf2latexe does a very bad job, or I'm missing some tips on its usage. It looks to me as if this is under active development: http://sourceforge.net/tracker/?atid=374868&group_id=22324&func=browse so you could try reporting bugs there. What's even worse is that there appears to be no active html2latex project. I do not see it anywhere in my distribution (I'm using Linux) and I wonder whether there's any story to that. Anyway, even if there were, I'd have to resort to online 'cleanup' tools to paste my document and get some clean HTML markup. Neither Word nor Writer outputs anything useful, and I don't want to go through the hoop of Ms Word> Writer> LaTeX extension> TeX file with gibberish when my document in fact is dead simple. Writer does a much better job nowadays than it used to do, because the LaTeX output is more configurable. Try the "Ultra clean article" export, for example. (There's no need to involve Word in any way here.) Better yet, download the writer2latex binary from http://writer2latex.sourceforge.net/ and the PyODConverter from: https://github.com/mirkonasato/pyodconverter and you can do it all from the command line. E.g.: python DocumentConverter.py myfile.rtf myfile.odt w2l -clean myfile.odt Richard
HTML and RTF: Very basic import and export strategy
Hi guys Either rtf2latexe does a very bad job, or I'm missing some tips on its usage. Heading 1 gets translated to Section* instead of Section, and it'd be good if Title were mapped to Chapter and not left alone. What's more, anything more than a Heading 3 gets no section at all. I believe up to Heading 5 can be mapped with Paragraph and Subparagraph. What's worse is that there are plenty of forced spaces here, there and everywhere, along with some other gibberish that I did not want LaTeX to give me. When I typed the document(s) in Word or Writer, I did no formatting at all (myself) except for selecting paragraph styles (headings). In HTML terms, that'd mean: Some Section La la la la... <-- this blank line here simply means new paragraph, not "forced" space Bla bla bla... What's even worse is that there appears to be no active html2latex project. I do not see it anywhere in my distribution (I'm using Linux) and I wonder whether there's any story to that. Anyway, even if there were, I'd have to resort to online 'cleanup' tools to paste my document and get some clean HTML markup. Neither Word nor Writer outputs anything useful, and I don't want to go through the hoop of Ms Word > Writer > LaTeX extension > TeX file with gibberish when my document in fact is dead simple. So...is there a way to import and export _very_ basic documents? If not, it's time to get coding (note to self as well as others). I didn't manage to use LyX's import functions as even with rtf2latexe I don't see an option. I did see HTML import before but after reconfiguring recently it is nowhere to be seen in the UI. The process should preserve only the layout and structure (i.e. sectioning). There is no need to deal with figures or tables, and even retaining formatting (bold and italic fonts) is not a requirement. Paragraph spacing should conform to LyX settings, whereby an empty line is removed if there is no provision for such spacing in LyX. This way, one could use Word or Writer to finish up the content, save to RTF or HTML, and then import in LyX. Really, this is theoretically a no-brainer, since you'd be dealing with only headings. It can be accomplished with sed! -- GPG/PGP ID: C0711BF1