Re: HTML and RTF: Very basic import and export strategy

2012-05-18 Thread Rashif Ray Rahman
On 15 May 2012 20:18, Wilfried wh...@gmx.de wrote:
 Or it's not the latest version? Current version is 2.0.1, see
 http://sourceforge.net/projects/rtf2latex2e/

It is, actually.

 How shall rtf2latex2e know that YOU want it THIS way?
 The heading conversion above is default setting, but it can be changed.
 In the subfolder ./pref there is a file r2l-map in which it is specified
 how headings are to be converted.

It _shouldn't_, but I'd expect an option to switch. Well, yet another TODO :)

I've just found gnuhtml2latex (because of the strange name my eyes
failed on searches), and it does provide an option to switch between
numbered and numbered sections.

 What are the rtf2latex2e calling parameters?
 Maybe you should call rtf2latex2e with the option -p1, not higher, see
 documentation.

Yes, I have tried -p1.

 That is a big difference. rtf2latex2e is aimed at Word's rtf output.
 Rtf from OOo and LibreOffice is broken.

Thanks, didn't know rtf was that complicated. A quick look inside an
rtf file gave me the impression that it'd be pretty standard across
all implementations as far as layout is concerned (formatting is
another story).

I've come to the conclusion that (x)html is a much better format to
deal with for this (though the website of rtf2latexe mentions
otherwise). Even though gnuhtml2latex seems to do an OK job, the
output is riddled with silly characters everywhere.

This  http://www.textfixer.com/html/convert-word-to-html.php  does
an excellent job. Would anyone know of a good commandline alternative
(for Linux)? A good solution would be a doc2html and a docx2html,
along with a html cleaner. I don't see any libraries for this aside
from lxml's html clean method for python (the quality of which I don't
know).


--
GPG/PGP ID: C0711BF1


Re: HTML and RTF: Very basic import and export strategy

2012-05-18 Thread Rashif Ray Rahman
On 15 May 2012 20:18, Wilfried wh...@gmx.de wrote:
 Or it's not the latest version? Current version is 2.0.1, see
 http://sourceforge.net/projects/rtf2latex2e/

It is, actually.

 How shall rtf2latex2e know that YOU want it THIS way?
 The heading conversion above is default setting, but it can be changed.
 In the subfolder ./pref there is a file r2l-map in which it is specified
 how headings are to be converted.

It _shouldn't_, but I'd expect an option to switch. Well, yet another TODO :)

I've just found gnuhtml2latex (because of the strange name my eyes
failed on searches), and it does provide an option to switch between
numbered and numbered sections.

 What are the rtf2latex2e calling parameters?
 Maybe you should call rtf2latex2e with the option -p1, not higher, see
 documentation.

Yes, I have tried -p1.

 That is a big difference. rtf2latex2e is aimed at Word's rtf output.
 Rtf from OOo and LibreOffice is broken.

Thanks, didn't know rtf was that complicated. A quick look inside an
rtf file gave me the impression that it'd be pretty standard across
all implementations as far as layout is concerned (formatting is
another story).

I've come to the conclusion that (x)html is a much better format to
deal with for this (though the website of rtf2latexe mentions
otherwise). Even though gnuhtml2latex seems to do an OK job, the
output is riddled with silly characters everywhere.

This  http://www.textfixer.com/html/convert-word-to-html.php  does
an excellent job. Would anyone know of a good commandline alternative
(for Linux)? A good solution would be a doc2html and a docx2html,
along with a html cleaner. I don't see any libraries for this aside
from lxml's html clean method for python (the quality of which I don't
know).


--
GPG/PGP ID: C0711BF1


Re: HTML and RTF: Very basic import and export strategy

2012-05-18 Thread Rashif Ray Rahman
On 15 May 2012 20:18, Wilfried  wrote:
> Or it's not the latest version? Current version is 2.0.1, see
> http://sourceforge.net/projects/rtf2latex2e/

It is, actually.

> How shall rtf2latex2e know that YOU want it THIS way?
> The heading conversion above is default setting, but it can be changed.
> In the subfolder ./pref there is a file r2l-map in which it is specified
> how headings are to be converted.

It _shouldn't_, but I'd expect an option to switch. Well, yet another TODO :)

I've just found gnuhtml2latex (because of the strange name my eyes
failed on searches), and it does provide an option to switch between
numbered and numbered sections.

> What are the rtf2latex2e calling parameters?
> Maybe you should call rtf2latex2e with the option -p1, not higher, see
> documentation.

Yes, I have tried -p1.

> That is a big difference. rtf2latex2e is aimed at Word's rtf output.
> Rtf from OOo and LibreOffice is broken.

Thanks, didn't know rtf was that complicated. A quick look inside an
rtf file gave me the impression that it'd be pretty standard across
all implementations as far as layout is concerned (formatting is
another story).

I've come to the conclusion that (x)html is a much better format to
deal with for this (though the website of rtf2latexe mentions
otherwise). Even though gnuhtml2latex seems to do an OK job, the
output is riddled with silly characters everywhere.

This >> http://www.textfixer.com/html/convert-word-to-html.php << does
an excellent job. Would anyone know of a good commandline alternative
(for Linux)? A good solution would be a doc2html and a docx2html,
along with a html cleaner. I don't see any libraries for this aside
from lxml's html clean method for python (the quality of which I don't
know).


--
GPG/PGP ID: C0711BF1


Re: HTML and RTF: Very basic import and export strategy

2012-05-15 Thread Wilfried
Rashif Ray Rahman sc...@archlinux.org wrote:

 Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

Or it's not the latest version? Current version is 2.0.1, see
http://sourceforge.net/projects/rtf2latex2e/

 Heading 1 gets translated to Section* instead of Section, and it'd be
 good if Title were mapped to Chapter and not left alone. What's more,
 anything more than a Heading 3 gets no section at all. I believe up to
 Heading 5 can be mapped with Paragraph and Subparagraph. 

How shall rtf2latex2e know that YOU want it THIS way?
The heading conversion above is default setting, but it can be changed.
In the subfolder ./pref there is a file r2l-map in which it is specified
how headings are to be converted.

 What's worse
 is that there are plenty of forced spaces here, there and everywhere,
 along with some other gibberish that I did not want LaTeX to give me.

What are the rtf2latex2e calling parameters?
Maybe you should call rtf2latex2e with the option -p1, not higher, see
documentation.

 When I typed the document(s) in Word or Writer, [...]

That is a big difference. rtf2latex2e is aimed at Word's rtf output.
Rtf from OOo and LibreOffice is broken.

Hope that helps,
--
Wilfried Hennings



Re: HTML and RTF: Very basic import and export strategy

2012-05-15 Thread Wilfried
Rashif Ray Rahman sc...@archlinux.org wrote:

 Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

Or it's not the latest version? Current version is 2.0.1, see
http://sourceforge.net/projects/rtf2latex2e/

 Heading 1 gets translated to Section* instead of Section, and it'd be
 good if Title were mapped to Chapter and not left alone. What's more,
 anything more than a Heading 3 gets no section at all. I believe up to
 Heading 5 can be mapped with Paragraph and Subparagraph. 

How shall rtf2latex2e know that YOU want it THIS way?
The heading conversion above is default setting, but it can be changed.
In the subfolder ./pref there is a file r2l-map in which it is specified
how headings are to be converted.

 What's worse
 is that there are plenty of forced spaces here, there and everywhere,
 along with some other gibberish that I did not want LaTeX to give me.

What are the rtf2latex2e calling parameters?
Maybe you should call rtf2latex2e with the option -p1, not higher, see
documentation.

 When I typed the document(s) in Word or Writer, [...]

That is a big difference. rtf2latex2e is aimed at Word's rtf output.
Rtf from OOo and LibreOffice is broken.

Hope that helps,
--
Wilfried Hennings



Re: HTML and RTF: Very basic import and export strategy

2012-05-15 Thread Wilfried
Rashif Ray Rahman  wrote:

> Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

Or it's not the latest version? Current version is 2.0.1, see
http://sourceforge.net/projects/rtf2latex2e/

> Heading 1 gets translated to Section* instead of Section, and it'd be
> good if Title were mapped to Chapter and not left alone. What's more,
> anything more than a Heading 3 gets no section at all. I believe up to
> Heading 5 can be mapped with Paragraph and Subparagraph. 

How shall rtf2latex2e know that YOU want it THIS way?
The heading conversion above is default setting, but it can be changed.
In the subfolder ./pref there is a file r2l-map in which it is specified
how headings are to be converted.

> What's worse
> is that there are plenty of forced spaces here, there and everywhere,
> along with some other gibberish that I did not want LaTeX to give me.

What are the rtf2latex2e calling parameters?
Maybe you should call rtf2latex2e with the option -p1, not higher, see
documentation.

> When I typed the document(s) in Word or Writer, [...]

That is a big difference. rtf2latex2e is aimed at Word's rtf output.
Rtf from OOo and LibreOffice is broken.

Hope that helps,
--
Wilfried Hennings



HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Rashif Ray Rahman
Hi guys

Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

Heading 1 gets translated to Section* instead of Section, and it'd be
good if Title were mapped to Chapter and not left alone. What's more,
anything more than a Heading 3 gets no section at all. I believe up to
Heading 5 can be mapped with Paragraph and Subparagraph. What's worse
is that there are plenty of forced spaces here, there and everywhere,
along with some other gibberish that I did not want LaTeX to give me.
When I typed the document(s) in Word or Writer, I did no formatting at
all (myself) except for selecting paragraph styles (headings). In HTML
terms, that'd mean:

h1Some Section/h1
La la la la...
   -- this blank line here simply means new
paragraph, not forced space
Bla bla bla...

What's even worse is that there appears to be no active html2latex
project. I do not see it anywhere in my distribution (I'm using Linux)
and I wonder whether there's any story to that. Anyway, even if there
were, I'd have to resort to online 'cleanup' tools to paste my
document and get some clean HTML markup. Neither Word nor Writer
outputs anything useful, and I don't want to go through the hoop of Ms
Word  Writer  LaTeX extension  TeX file with gibberish when my
document in fact is dead simple.

So...is there a way to import and export _very_ basic documents? If
not, it's time to get coding (note to self as well as others). I
didn't manage to use LyX's import functions as even with rtf2latexe I
don't see an option. I did see HTML import before but after
reconfiguring recently it is nowhere to be seen in the UI.

The process should preserve only the layout and structure (i.e.
sectioning). There is no need to deal with figures or tables, and even
retaining formatting (bold and italic fonts) is not a requirement.
Paragraph spacing should conform to LyX settings, whereby an empty
line is removed if there is no provision for such spacing in LyX.

This way, one could use Word or Writer to finish up the content, save
to RTF or HTML, and then import in LyX. Really, this is theoretically
a no-brainer, since you'd be dealing with only headings. It can be
accomplished with sed!


--
GPG/PGP ID: C0711BF1


Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Richard Heck

On 05/14/2012 08:36 AM, Rashif Ray Rahman wrote:

Hi guys

Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

It looks to me as if this is under active development:
http://sourceforge.net/tracker/?atid=374868group_id=22324func=browse
so you could try reporting bugs there.



What's even worse is that there appears to be no active html2latex
project. I do not see it anywhere in my distribution (I'm using Linux)
and I wonder whether there's any story to that. Anyway, even if there
were, I'd have to resort to online 'cleanup' tools to paste my
document and get some clean HTML markup. Neither Word nor Writer
outputs anything useful, and I don't want to go through the hoop of Ms
Word  Writer  LaTeX extension  TeX file with gibberish when my
document in fact is dead simple.

Writer does a much better job nowadays than it used to do, because the 
LaTeX output
is more configurable. Try the Ultra clean article export, for example. 
(There's no need

to involve Word in any way here.)

Better yet, download the writer2latex binary from
http://writer2latex.sourceforge.net/
and the PyODConverter from:
https://github.com/mirkonasato/pyodconverter
and you can do it all from the command line. E.g.:
python DocumentConverter.py myfile.rtf myfile.odt
w2l -clean myfile.odt

Richard



Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Nico Williams
Richard,

Does LyX XHTML output preserve enough LyX metadata to be suitable as
an import format?

Nico
--


Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Richard Heck

On 05/14/2012 10:26 AM, Nico Williams wrote:

Richard,

Does LyX XHTML output preserve enough LyX metadata to be suitable as
an import format?

You mean back into LyX?

Richard



Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Nico Williams
On Mon, May 14, 2012 at 9:30 AM, Richard Heck rgh...@comcast.net wrote:
 On 05/14/2012 10:26 AM, Nico Williams wrote:

 Richard,

 Does LyX XHTML output preserve enough LyX metadata to be suitable as
 an import format?

 You mean back into LyX?

Yes.  With XML formats becoming ubiquitous that seems like it'd be useful.

Nico
--


HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Rashif Ray Rahman
Hi guys

Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

Heading 1 gets translated to Section* instead of Section, and it'd be
good if Title were mapped to Chapter and not left alone. What's more,
anything more than a Heading 3 gets no section at all. I believe up to
Heading 5 can be mapped with Paragraph and Subparagraph. What's worse
is that there are plenty of forced spaces here, there and everywhere,
along with some other gibberish that I did not want LaTeX to give me.
When I typed the document(s) in Word or Writer, I did no formatting at
all (myself) except for selecting paragraph styles (headings). In HTML
terms, that'd mean:

h1Some Section/h1
La la la la...
   -- this blank line here simply means new
paragraph, not forced space
Bla bla bla...

What's even worse is that there appears to be no active html2latex
project. I do not see it anywhere in my distribution (I'm using Linux)
and I wonder whether there's any story to that. Anyway, even if there
were, I'd have to resort to online 'cleanup' tools to paste my
document and get some clean HTML markup. Neither Word nor Writer
outputs anything useful, and I don't want to go through the hoop of Ms
Word  Writer  LaTeX extension  TeX file with gibberish when my
document in fact is dead simple.

So...is there a way to import and export _very_ basic documents? If
not, it's time to get coding (note to self as well as others). I
didn't manage to use LyX's import functions as even with rtf2latexe I
don't see an option. I did see HTML import before but after
reconfiguring recently it is nowhere to be seen in the UI.

The process should preserve only the layout and structure (i.e.
sectioning). There is no need to deal with figures or tables, and even
retaining formatting (bold and italic fonts) is not a requirement.
Paragraph spacing should conform to LyX settings, whereby an empty
line is removed if there is no provision for such spacing in LyX.

This way, one could use Word or Writer to finish up the content, save
to RTF or HTML, and then import in LyX. Really, this is theoretically
a no-brainer, since you'd be dealing with only headings. It can be
accomplished with sed!


--
GPG/PGP ID: C0711BF1


Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Richard Heck

On 05/14/2012 08:36 AM, Rashif Ray Rahman wrote:

Hi guys

Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

It looks to me as if this is under active development:
http://sourceforge.net/tracker/?atid=374868group_id=22324func=browse
so you could try reporting bugs there.



What's even worse is that there appears to be no active html2latex
project. I do not see it anywhere in my distribution (I'm using Linux)
and I wonder whether there's any story to that. Anyway, even if there
were, I'd have to resort to online 'cleanup' tools to paste my
document and get some clean HTML markup. Neither Word nor Writer
outputs anything useful, and I don't want to go through the hoop of Ms
Word  Writer  LaTeX extension  TeX file with gibberish when my
document in fact is dead simple.

Writer does a much better job nowadays than it used to do, because the 
LaTeX output
is more configurable. Try the Ultra clean article export, for example. 
(There's no need

to involve Word in any way here.)

Better yet, download the writer2latex binary from
http://writer2latex.sourceforge.net/
and the PyODConverter from:
https://github.com/mirkonasato/pyodconverter
and you can do it all from the command line. E.g.:
python DocumentConverter.py myfile.rtf myfile.odt
w2l -clean myfile.odt

Richard



Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Nico Williams
Richard,

Does LyX XHTML output preserve enough LyX metadata to be suitable as
an import format?

Nico
--


Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Richard Heck

On 05/14/2012 10:26 AM, Nico Williams wrote:

Richard,

Does LyX XHTML output preserve enough LyX metadata to be suitable as
an import format?

You mean back into LyX?

Richard



Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Nico Williams
On Mon, May 14, 2012 at 9:30 AM, Richard Heck rgh...@comcast.net wrote:
 On 05/14/2012 10:26 AM, Nico Williams wrote:

 Richard,

 Does LyX XHTML output preserve enough LyX metadata to be suitable as
 an import format?

 You mean back into LyX?

Yes.  With XML formats becoming ubiquitous that seems like it'd be useful.

Nico
--


HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Rashif Ray Rahman
Hi guys

Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

Heading 1 gets translated to Section* instead of Section, and it'd be
good if Title were mapped to Chapter and not left alone. What's more,
anything more than a Heading 3 gets no section at all. I believe up to
Heading 5 can be mapped with Paragraph and Subparagraph. What's worse
is that there are plenty of forced spaces here, there and everywhere,
along with some other gibberish that I did not want LaTeX to give me.
When I typed the document(s) in Word or Writer, I did no formatting at
all (myself) except for selecting paragraph styles (headings). In HTML
terms, that'd mean:

Some Section
La la la la...
   <-- this blank line here simply means new
paragraph, not "forced" space
Bla bla bla...

What's even worse is that there appears to be no active html2latex
project. I do not see it anywhere in my distribution (I'm using Linux)
and I wonder whether there's any story to that. Anyway, even if there
were, I'd have to resort to online 'cleanup' tools to paste my
document and get some clean HTML markup. Neither Word nor Writer
outputs anything useful, and I don't want to go through the hoop of Ms
Word > Writer > LaTeX extension > TeX file with gibberish when my
document in fact is dead simple.

So...is there a way to import and export _very_ basic documents? If
not, it's time to get coding (note to self as well as others). I
didn't manage to use LyX's import functions as even with rtf2latexe I
don't see an option. I did see HTML import before but after
reconfiguring recently it is nowhere to be seen in the UI.

The process should preserve only the layout and structure (i.e.
sectioning). There is no need to deal with figures or tables, and even
retaining formatting (bold and italic fonts) is not a requirement.
Paragraph spacing should conform to LyX settings, whereby an empty
line is removed if there is no provision for such spacing in LyX.

This way, one could use Word or Writer to finish up the content, save
to RTF or HTML, and then import in LyX. Really, this is theoretically
a no-brainer, since you'd be dealing with only headings. It can be
accomplished with sed!


--
GPG/PGP ID: C0711BF1


Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Richard Heck

On 05/14/2012 08:36 AM, Rashif Ray Rahman wrote:

Hi guys

Either rtf2latexe does a very bad job, or I'm missing some tips on its usage.

It looks to me as if this is under active development:
http://sourceforge.net/tracker/?atid=374868_id=22324=browse
so you could try reporting bugs there.



What's even worse is that there appears to be no active html2latex
project. I do not see it anywhere in my distribution (I'm using Linux)
and I wonder whether there's any story to that. Anyway, even if there
were, I'd have to resort to online 'cleanup' tools to paste my
document and get some clean HTML markup. Neither Word nor Writer
outputs anything useful, and I don't want to go through the hoop of Ms
Word>  Writer>  LaTeX extension>  TeX file with gibberish when my
document in fact is dead simple.

Writer does a much better job nowadays than it used to do, because the 
LaTeX output
is more configurable. Try the "Ultra clean article" export, for example. 
(There's no need

to involve Word in any way here.)

Better yet, download the writer2latex binary from
http://writer2latex.sourceforge.net/
and the PyODConverter from:
https://github.com/mirkonasato/pyodconverter
and you can do it all from the command line. E.g.:
python DocumentConverter.py myfile.rtf myfile.odt
w2l -clean myfile.odt

Richard



Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Nico Williams
Richard,

Does LyX XHTML output preserve enough LyX metadata to be suitable as
an import format?

Nico
--


Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Richard Heck

On 05/14/2012 10:26 AM, Nico Williams wrote:

Richard,

Does LyX XHTML output preserve enough LyX metadata to be suitable as
an import format?

You mean back into LyX?

Richard



Re: HTML and RTF: Very basic import and export strategy

2012-05-14 Thread Nico Williams
On Mon, May 14, 2012 at 9:30 AM, Richard Heck  wrote:
> On 05/14/2012 10:26 AM, Nico Williams wrote:
>>
>> Richard,
>>
>> Does LyX XHTML output preserve enough LyX metadata to be suitable as
>> an import format?
>
> You mean back into LyX?

Yes.  With XML formats becoming ubiquitous that seems like it'd be useful.

Nico
--