Re: Html or Pdf to Rtf (Linux) with Python
Hello! You might take a look at PyRTF in PyPI. It's still in beta, I think PyRTF would be the right choice, thanks. Yust had a short look at it. Lg, AXEL. -- The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 [http://ietf.org/rfc/rfc2119.txt] -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
On Fri, 17 Dec 2004 07:55:10 + (UTC), Axel Straschil [EMAIL PROTECTED] wrote: Hello! I've been able to successfully get konqueror to generate a pdf from a html file via dcop. It's something along the lines of: For that stuff, I'm using htmloc (http://www.htmldoc.org/). I found htmldoc and every other open source purpose built html-pdf converter to be deficient enough to discourage us from using them. For our requirements only web-browsers had the quality of rendering required. Stephen. -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
Axel Straschil [EMAIL PROTECTED] writes: Hallo! However, our company's product, PDFTextStream does do a phenomenal job of extracting text and metadata out of PDF documents. It's crazy-fast, has a clean API, and in general gets the job done very nicely. It presents two points of compromise from your idea situation: 1. It only produces text, so you would have to take the text it provides and write it out as an RTF yourself (there are tons of packages and tools that do this). Since the RTF format has pretty weak formatting capabilities compared I've got the Input Source in HTML, the Problem ist converting from any to RTF. Please give me a hint where the tons of packages are. That's easy. Load the HTML in MS Word, and save it as RTF. Script it via COM using the python win32all (I think that's what it's now called) package. mike -- Mike Meyer [EMAIL PROTECTED] http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
On Thu, 16 Dec 2004 19:30:37 + (UTC), Axel Straschil [EMAIL PROTECTED] wrote: That's easy. Load the HTML in MS Word, and save it as RTF. Script it via COM using the python win32all (I think that's what it's now called) package. As I wrote in my posting and the subject: linux ;-) I could try to do this with open office, by I'm afraid this will not be a performant solution ;-( I realy was spending hour's on that, the only thing I found was a spezifikation for reach text, maybe a good point to start a project ... I've been able to successfully get konqueror to generate a pdf from a html file via dcop. It's something along the lines of: % dcop konqueror-25827 html-widget1 print 1 You can launch konq in a xvfb (X Virtual Framebuffer) then communicate via dcop to send commands to the browser (load this url, print this page, etc). I've been investigating doing the same feat using JS/XUL/etc in mozilla. It probably is possible. There's lots of documentation about the XPCOM api available from http://xulplanet.com/ As for converting to RTF, someone has already pointed out PyRTF. Regards, Stephen Thorne -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
Hello! I've been able to successfully get konqueror to generate a pdf from a html file via dcop. It's something along the lines of: For that stuff, I'm using htmloc (http://www.htmldoc.org/). Lg, AXEL. -- The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 [http://ietf.org/rfc/rfc2119.txt] -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
I haven't seen any solid responses come across the wire, and I suspect there isn't a product or package that will do exactly what you want. blatent_self_promotion However, our company's product, PDFTextStream does do a phenomenal job of extracting text and metadata out of PDF documents. It's crazy-fast, has a clean API, and in general gets the job done very nicely. It presents two points of compromise from your idea situation: 1. It only produces text, so you would have to take the text it provides and write it out as an RTF yourself (there are tons of packages and tools that do this). Since the RTF format has pretty weak formatting capabilities compared to PDF (and even compared to HTML+CSS), you'd likely never reproduce the original layout/content of the source document anyway. 2. It is a Java library. You indicated in a later message that you were aiming to use a python package if possible just out of personal preference. Assuming such a thing does not exist, and you are able to introduce a Java component to your project, this would become a non-issue. /blatent_self_promotion Let me know what your questions are. Chas Emerick [EMAIL PROTECTED] Snowtide Informatics Systems PDFTextStream: fast PDF text extraction for Java apps and Lucene http://snowtide.com/home/PDFTextStream/ Alexander Straschil wrote: Hello! I have to convert an HTML document to rtf with python, was just googling for an hour and did find nothing ;-( Has anybody an Idea how to convert (under Linux) an HTML or Pdf Document to Rtf? Thanks, AXEL -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
Hallo! However, our company's product, PDFTextStream does do a phenomenal job of extracting text and metadata out of PDF documents. It's crazy-fast, has a clean API, and in general gets the job done very nicely. It presents two points of compromise from your idea situation: 1. It only produces text, so you would have to take the text it provides and write it out as an RTF yourself (there are tons of packages and tools that do this). Since the RTF format has pretty weak formatting capabilities compared I've got the Input Source in HTML, the Problem ist converting from any to RTF. Please give me a hint where the tons of packages are. Thanks, AXEL. -- The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119 [http://ietf.org/rfc/rfc2119.txt] -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
In article [EMAIL PROTECTED], Axel Straschil [EMAIL PROTECTED] wrote: Hello! Sorry Cameron, I was replying, now my folloup ;-): Are you trying to convert one document in particular, or automate the process of conveting arbitrary HTML documents? I have an small CMS System where the customer has the posibility to view certain Html-Pages as Pdf, the CMS ist Python based. I also thought about passing the Url to an external converter Script, but found nothing ;-( What computing host is available to you--Win*? Linux? MacOS? Solaris!? Linux Is Word installed? No. OpenOffice? Yes. Why have you specified Python? Becouse I like Python ;-) The System behind generating the HTML-Code is written in Python. . . . That's a fine reason to use Python. It helps me to know, though. I do a lot of this sort of thing--automation of conversion between different Web display-formats. I don't have a one-line answer for the particular one you describe, but it's certainly feasible. I'm willing to bet there's an HTML-to-RTF converter available for Linux, but I've never needed (more accurately: I have written my own for special purposes--for my situations, it hasn't been diffi- cult) one, so I can't say for sure. My first step would be to look for such an application. Failing that, I'd script OpenOffice (with Python!) to read the HTML, and SaveAs RTF. I list a few PDF-to-RTF converters in URL: http://phaseit.net/claird/comp.text.pdf/PDF_converters.html#RTF . Again, I think there are more, but haven't yet made the time to hunt them all down. -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
In article [EMAIL PROTECTED], Alexander Straschil [EMAIL PROTECTED] wrote: Hello! I have to convert an HTML document to rtf with python, was just googling for an hour and did find nothing ;-( Has anybody an Idea how to convert (under Linux) an HTML or Pdf Document to Rtf? Thanks, AXEL Are you trying to convert one document in particular, or automate the process of conveting arbitrary HTML documents? What computing host is available to you--Win*? Linux? MacOS? Solaris!? Is Word installed? OpenOffice? Why have you specified Python? -- http://mail.python.org/mailman/listinfo/python-list
Re: Html or Pdf to Rtf (Linux) with Python
Hello! Sorry Cameron, I was replying, now my folloup ;-): Are you trying to convert one document in particular, or automate the process of conveting arbitrary HTML documents? I have an small CMS System where the customer has the posibility to view certain Html-Pages as Pdf, the CMS ist Python based. I also thought about passing the Url to an external converter Script, but found nothing ;-( What computing host is available to you--Win*? Linux? MacOS? Solaris!? Linux Is Word installed? No. OpenOffice? Yes. Why have you specified Python? Becouse I like Python ;-) The System behind generating the HTML-Code is written in Python. Thanks, AXEL. -- http://mail.python.org/mailman/listinfo/python-list