Re: Html or Pdf to Rtf (Linux) with Python

2004-12-17 Thread Axel Straschil
Hello!

 You might take a look at PyRTF in PyPI. It's still in beta,

I think PyRTF would be the right choice, thanks. Yust had a short look
at it.

Lg,
AXEL.
-- 
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be
interpreted as described in RFC 2119 [http://ietf.org/rfc/rfc2119.txt]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-17 Thread Stephen Thorne
On Fri, 17 Dec 2004 07:55:10 + (UTC), Axel Straschil
[EMAIL PROTECTED] wrote:
 Hello!
 
  I've been able to successfully get konqueror to generate a pdf from a
  html file via dcop. It's something along the lines of:
 
 For that stuff, I'm using htmloc (http://www.htmldoc.org/).

I found htmldoc and every other open source purpose built html-pdf
converter to be deficient enough to discourage us from using them. For
our requirements only web-browsers had the quality of rendering
required.

Stephen.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-16 Thread Mike Meyer
Axel Straschil [EMAIL PROTECTED] writes:

 Hallo!

 However, our company's product, PDFTextStream does do a phenomenal
 job of extracting text and metadata out of PDF documents.  It's
 crazy-fast, has a clean API, and in general gets the job done very
 nicely.  It presents two points of compromise from your idea
 situation:
 1. It only produces text, so you would have to take the text it
 provides and write it out as an RTF yourself (there are tons of
 packages and tools that do this).  Since the RTF format has pretty
 weak formatting capabilities compared

 I've got the Input Source in HTML, the Problem ist converting from any
 to RTF. Please give me a hint where the tons of packages are.

That's easy. Load the HTML in MS Word, and save it as RTF. Script it
via COM using the python win32all (I think that's what it's now
called) package.

mike
-- 
Mike Meyer [EMAIL PROTECTED]  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-16 Thread Stephen Thorne
On Thu, 16 Dec 2004 19:30:37 + (UTC), Axel Straschil
[EMAIL PROTECTED] wrote:
  That's easy. Load the HTML in MS Word, and save it as RTF. Script it
  via COM using the python win32all (I think that's what it's now
  called) package.
 
 As I wrote in my posting and the subject: linux ;-)
 I could try to do this with open office, by I'm afraid this will not
 be a performant solution ;-(
 I realy was spending hour's on that, the only thing I found was a
 spezifikation for reach text, maybe a good point to start a project ...

I've been able to successfully get konqueror to generate a pdf from a
html file via dcop. It's something along the lines of:
% dcop konqueror-25827 html-widget1 print 1
You can launch konq in a xvfb (X Virtual Framebuffer) then communicate
via dcop to send commands to the browser (load this url, print this
page, etc).

I've been investigating doing the same feat using JS/XUL/etc in
mozilla. It probably is possible. There's lots of documentation about
the XPCOM api available from http://xulplanet.com/

As for converting to RTF, someone has already pointed out PyRTF.

Regards,
Stephen Thorne
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-16 Thread Axel Straschil
Hello!

 I've been able to successfully get konqueror to generate a pdf from a
 html file via dcop. It's something along the lines of:

For that stuff, I'm using htmloc (http://www.htmldoc.org/).

Lg,
AXEL.
-- 
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be
interpreted as described in RFC 2119 [http://ietf.org/rfc/rfc2119.txt]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-15 Thread Chas Emerick
I haven't seen any solid responses come across the wire, and I suspect 
there isn't a product or package that will do exactly what you want.

blatent_self_promotion
However, our company's product, PDFTextStream does do a phenomenal job 
of extracting text and metadata out of PDF documents.  It's crazy-fast, 
has a clean API, and in general gets the job done very nicely.  It 
presents two points of compromise from your idea situation:

1. It only produces text, so you would have to take the text it 
provides and write it out as an RTF yourself (there are tons of 
packages and tools that do this).  Since the RTF format has pretty weak 
formatting capabilities compared to PDF (and even compared to 
HTML+CSS), you'd likely never reproduce the original layout/content of 
the source document anyway.

2. It is a Java library.  You indicated in a later message that you 
were aiming to use a python package if possible just out of personal 
preference.  Assuming such a thing does not exist, and you are able to 
introduce a Java component to your project, this would become a 
non-issue.
/blatent_self_promotion

Let me know what your questions are.
Chas Emerick
[EMAIL PROTECTED]
Snowtide Informatics Systems
PDFTextStream: fast PDF text extraction for Java apps and Lucene
http://snowtide.com/home/PDFTextStream/
Alexander Straschil wrote:
Hello!
I have to convert an HTML document to rtf with python, was just 
googling
for an hour and did find nothing ;-(
Has anybody an Idea how to convert (under Linux)  an HTML or Pdf 
Document
to Rtf?

Thanks, AXEL
--
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-15 Thread Axel Straschil
Hallo!
However, our company's product, PDFTextStream does do a phenomenal job of 
extracting text and metadata out of PDF documents.  It's crazy-fast, has a 
clean API, and in general gets the job done very nicely.  It presents two 
points of compromise from your idea situation:
1. It only produces text, so you would have to take the text it provides and 
write it out as an RTF yourself (there are tons of packages and tools that do 
this).  Since the RTF format has pretty weak formatting capabilities compared
I've got the Input Source in HTML, the Problem ist converting from any to 
RTF. Please give me a hint where the tons of packages are.

Thanks,
AXEL.
--
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be
interpreted as described in RFC 2119 [http://ietf.org/rfc/rfc2119.txt]
--
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-14 Thread Cameron Laird
In article [EMAIL PROTECTED],
Axel Straschil  [EMAIL PROTECTED] wrote:
Hello!

Sorry Cameron, I was replying, now my folloup ;-):

 Are you trying to convert one document in particular, or automate the
 process of conveting arbitrary HTML documents?

I have an small CMS System where the customer has the posibility to view
certain Html-Pages as Pdf, the CMS ist Python based. I also thought
about
passing the Url to an external converter Script, but found nothing ;-(


 What computing host is available to you--Win*?  Linux?  MacOS?
 Solaris!?

Linux

 Is Word installed?

No.

 OpenOffice?

Yes.

 Why have you specified Python?

Becouse I like Python ;-)
The System behind generating the HTML-Code is written in Python.
.
.
.
That's a fine reason to use Python.  It helps me to know, though.

I do a lot of this sort of thing--automation of conversion between
different Web display-formats.  I don't have a one-line answer for
the particular one you describe, but it's certainly feasible.

I'm willing to bet there's an HTML-to-RTF converter available for
Linux, but I've never needed (more accurately:  I have written my
own for special purposes--for my situations, it hasn't been diffi-
cult) one, so I can't say for sure.  My first step would be to 
look for such an application.  Failing that, I'd script OpenOffice
(with Python!) to read the HTML, and SaveAs RTF.

I list a few PDF-to-RTF converters in URL:
http://phaseit.net/claird/comp.text.pdf/PDF_converters.html#RTF .
Again, I think there are more, but haven't yet made the time to 
hunt them all down.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-14 Thread Cameron Laird
In article [EMAIL PROTECTED],
Alexander Straschil  [EMAIL PROTECTED] wrote:
Hello!

I have to convert an HTML document to rtf with python, was just googling
for an hour and did find nothing ;-(
Has anybody an Idea how to convert (under Linux)  an HTML or Pdf Document
to Rtf?

Thanks, AXEL

Are you trying to convert one document in particular, or automate the
process of conveting arbitrary HTML documents?  What computing host is
available to you--Win*?  Linux?  MacOS?  Solaris!?  Is Word installed?
OpenOffice?  Why have you specified Python?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Html or Pdf to Rtf (Linux) with Python

2004-12-14 Thread Axel Straschil
Hello!

Sorry Cameron, I was replying, now my folloup ;-):

 Are you trying to convert one document in particular, or automate the
 process of conveting arbitrary HTML documents?

I have an small CMS System where the customer has the posibility to view
certain Html-Pages as Pdf, the CMS ist Python based. I also thought
about
passing the Url to an external converter Script, but found nothing ;-(


 What computing host is available to you--Win*?  Linux?  MacOS?
 Solaris!?

Linux

 Is Word installed?

No.

 OpenOffice?

Yes.

 Why have you specified Python?

Becouse I like Python ;-)
The System behind generating the HTML-Code is written in Python.

Thanks,
AXEL.

-- 
http://mail.python.org/mailman/listinfo/python-list