Kay,

I've met one of Devon Tech developers in Malta :-) Small world ain't it, a
Brazilian and he is from Bulgary, met in Malta in the house of Sims and
Cloe.

My needs are simple. I put online versions of some parts of magazines for
the Himalayan Academy Publications (www.himalayanacademy.com). We're
interested in the content not in the presentation layer, so to extract that
data, I tried some pdf conversion tools and tried parsing the pdf myself. As
Bryan found, many documents are made for humans and computers have no way to
make sense of them. So in the end I create a little stack tool that allows
me to place the content in fields and then generate the HTML to be put
online using XSLT. I am lucky to have an old version of Adobe InDesigner
that my university allowed me as a monitor to install. Sivakatirswami then
send me an IDX file which is an XML InDesign Exchange file, I open this file
in InDesign and use a mix of applescript to get the data from InDesign.

It's the same approach I said before, one applescript get the selected text
of InDesign. Adobe application does not return the text but return a pair of
chunk positions like "char 100 to 10024 of box 23 of document 1" then
another applescript uses those chunk positions to get the actual text.

I can work really well with this approach, my stack replaces unicode
entities and strange chars and transforms everything automatically. First I
used PDFs but selecting multi colum text in Preview mixed the text when
applescript picked it, so I ditched it in favor of InDesign files.

If you're not selecting multi colum text, applescript + preview works fine.
Be aware that the trouble you quote about Revolution clipboard text will not
affect applescript workflows like this.

I don't have much experience with pdf2html programs, I know they exist thats
why I told Bryan to investigate, but so far, I managed to solve my troubles
using the raw indesign source files.

The idea of using clipboard or applescript to make the user more productive,
since selecting is faster than typing and less error prone.

Cheers
andre

On 7/6/07, Kay C Lan <[EMAIL PROTECTED]> wrote:

On 7/6/07, Andre Garzia <[EMAIL PROTECTED]> wrote:
>
> A system agnostic approach would be to ask the user to select and copy
the
> title to the clipboard, this way, you just need to check
> clipboarddata["text"] to get your title.


Unfortunately  Rev's inter-app clipboard transferring ability is less than
stellar. Sometimes accessing the data via script works, a lot of the times
it doesn't, sometimes using keyboard short cuts work, sometimes they don't
-
using menu items to cut and paste work consistently. Again this is only
referring to transferring clipboardData from other apps to Rev.

Andre can you provide a little more info on pdf2html command line tools
for
OSX. I did a quick search of Man pages (using ManOpen) and the only
reference that came up for pdf was snmpdf, which of course has nothing to
do
with pdfs. I did a search on google for OSX command line pdf tools and
came
up with commercial products:

http://www.pdf-tools.com/asp/products.asp?name=CLS&type=shell

selling for $360 a pop!!!

If the tool already comes with OSX, how can I find out the command syntax
and options?

I currently use the free and excellent  PDF2RTFService

http://www.devon-technologies.com/products/freeware/services.html

but I can't see anywhere how to access this via the command line.

I work with lots and lots of pdf, but currently, because the clipboardData
is unreliable, I work in two steps, I use Automator to open all the pdf
files with TextEdit (PDF2RTFService taking care of the conversion process)
and save them as plain text. Then I use Rev to open and read the files and
do the real work.

Thanks
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to