Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Halaasz Saandor

2019/06/03 22:58 ... Tim Chase:

The quality of the output depends largely on how the PDF was created,
so I have some mostly-pure-text PDFs where it works great; and I have
some PDFs that are full of graphics and poorly laid-out that are next
to useless when piped through pdftotext.  YMMV.


And I hav encountered PDFs that were really only collections of 
photographed pages.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Klaus-Peter Wegge

The benefit of using the pdftohtml and reading the html-document
with lynx is,
that all links (internal and external) are usable including
links to the document pages (outuline).
One problem is, that pdftohtml does not recoginise the reading order tags
in pdf (e.g. text in two or more columns).

Klaus

On Tue, 4 Jun 2019, Steffen Nurpmeso wrote:


Klaus-Peter Wegge wrote in :
|On Tue, 4 Jun 2019, Steffen Nurpmeso wrote:
|> russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.ne\
|> t>:
|>| I use:
|>|
|>| pdftotext -layout %s - | utf8trans UTFtoASCII
|>
|> Yes, Mr. Bell, luckily it has that -layout argument.
|> Whereas mupdf (now) comes with mutool, which can "convert", that
|> pdftotext from poppler with its -layout is the only PDF (and thus
|> PS) converter i know who does an acceptable job.
 ...
|Hi,
|I'm reading most document formats with the help auf lynx and various
|format2html tools like pdftohtml

I possibly should have pointed out that i had direct conversation
with Mr. Bell on another ML in the past, and to me he is known as
someone who hits the mark.  (He reported bugs of the software
i maintain.  Thanks again, Mr. Bell.)

|Here is short version of my viewer script:
|
|---
|#!/bin/sh
|
|dir=/tmp/$USER/viewer.$$
|mkdir -p $dir
|doc=$1
|
|case $doc in
|*.pdf) file=`basename "$doc" .pdf`;
|   echo Portable Document Format: $doc;
|   html="$file"_ind.html;
|   pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file"
|*.vcl) file=`basename "$doc" .vcl`;
|   echo Calendar: $doc;
|   html="$file.txt";
|   vcal "$doc" >"$dir/$html";
|   show="$dir/$html";;
|*) echo eror;
|   exit 0;;
|esac
|
|lynx -nolist file://localhost"$show"
|rm -rf "$dir"
|
|---
|I have removed the cases for other formats.
|Remark: the pdftohtml Option seem to be different on various Linuxes.
|
|Kluaus

lynx is a "swiss army knife" program, just like the shell.
I personally prefer plain text, if i can.  Luckily i have all my
senses in acceptable shape, that is to say, and do not need
a Braille reader, nor do i know how bad that ends up when
converting PS or PDF to plain text or HTML.  All i can imagine is
that converting via ghostscript's text output cannot be it.

--steffen


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Steffen Nurpmeso
Klaus-Peter Wegge wrote in :
 |On Tue, 4 Jun 2019, Steffen Nurpmeso wrote:
 |> russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.ne\
 |> t>:
 |>| I use:
 |>|
 |>| pdftotext -layout %s - | utf8trans UTFtoASCII
 |>
 |> Yes, Mr. Bell, luckily it has that -layout argument.
 |> Whereas mupdf (now) comes with mutool, which can "convert", that
 |> pdftotext from poppler with its -layout is the only PDF (and thus
 |> PS) converter i know who does an acceptable job.
  ...
 |Hi,
 |I'm reading most document formats with the help auf lynx and various
 |format2html tools like pdftohtml

I possibly should have pointed out that i had direct conversation
with Mr. Bell on another ML in the past, and to me he is known as
someone who hits the mark.  (He reported bugs of the software
i maintain.  Thanks again, Mr. Bell.)

 |Here is short version of my viewer script:
 |
 |---
 |#!/bin/sh
 |
 |dir=/tmp/$USER/viewer.$$
 |mkdir -p $dir
 |doc=$1
 |
 |case $doc in
 |*.pdf) file=`basename "$doc" .pdf`;
 |   echo Portable Document Format: $doc;
 |   html="$file"_ind.html;
 |   pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file"
 |*.vcl) file=`basename "$doc" .vcl`;
 |   echo Calendar: $doc;
 |   html="$file.txt";
 |   vcal "$doc" >"$dir/$html";
 |   show="$dir/$html";;
 |*) echo eror;
 |   exit 0;;
 |esac
 |
 |lynx -nolist file://localhost"$show"
 |rm -rf "$dir"
 |
 |---
 |I have removed the cases for other formats.
 |Remark: the pdftohtml Option seem to be different on various Linuxes.
 |
 |Kluaus

lynx is a "swiss army knife" program, just like the shell.
I personally prefer plain text, if i can.  Luckily i have all my
senses in acceptable shape, that is to say, and do not need
a Braille reader, nor do i know how bad that ends up when
converting PS or PDF to plain text or HTML.  All i can imagine is
that converting via ghostscript's text output cannot be it.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Chime Hart
Well, Steffan-and-All, thanks much for a spirited discussion. I will try mupdf, 
but also a majority of the time in pdftotext, I am not needing a -layout option 
as much as years ago. And yes, as example, my HOA sends an invoice as a pdf, but 
more what I wanted to do is read an article on a news related web-site, where an 
only option may be a pdf. Seems like a waste of time to download or save a 
pdf-and-then hope to convert. Sometimes all you see is a letter l. Thanks again

Chime


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Thorsten Glaser
Tim Chase dixit:

>If you have the "pdftotext" utility (part of my "poppler-utils"

pdftotext isn’t bad, but I almost always have X11+uxterm around
lynx anyway (for proper Unicode support), so I get along well
with:

$ fgrep mupdf /etc/lynx.cfg
DOWNLOADER:View in mupdf:mupdf '%s':FALSE:XWINDOWS

bye,
//mirabilos
-- 
15:39⎜«mika:#grml» mira|AO: "mit XFree86® wär’ das nicht passiert" - muhaha
15:48⎜ also warum machen die xorg Jungs eigentlich alles
kaputt? :)15:49⎜ thkoehler: weil sie als Kinder nie den
gebauten Turm selber umschmeissen durften?  -- ~/.Xmodmap wonders…

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Klaus-Peter Wegge

Hi,
I'm reading most document formats with the help auf lynx and various
format2html tools like pdftohtml

Here is short version of my viewer script:

---
#!/bin/sh

dir=/tmp/$USER/viewer.$$
mkdir -p $dir
doc=$1

case $doc in
   *.pdf) file=`basename "$doc" .pdf`;
  echo Portable Document Format: $doc;
  html="$file"_ind.html;
  pdftohtml -nodrm -hidden -enc Latin1 "$doc" "$dir"\/"$file"
   *.vcl) file=`basename "$doc" .vcl`;
  echo Calendar: $doc;
  html="$file.txt";
  vcal "$doc" >"$dir/$html";
  show="$dir/$html";;
   *) echo eror;
  exit 0;;
esac

lynx -nolist file://localhost"$show"
rm -rf "$dir"

---
I have removed the cases for other formats.
Remark: the pdftohtml Option seem to be different on various Linuxes.

Kluaus

On Tue, 4 Jun 2019, Steffen Nurpmeso wrote:


russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.net>:
| I use:
|
| pdftotext -layout %s - | utf8trans UTFtoASCII

Yes, Mr. Bell, luckily it has that -layout argument.
Whereas mupdf (now) comes with mutool, which can "convert", that
pdftotext from poppler with its -layout is the only PDF (and thus
PS) converter i know who does an acceptable job.

--steffen
|
___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev




___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Steffen Nurpmeso
russellb...@gmail.com wrote in <201906040449.x544niog005...@randytool.net>:
 | I use:
 |
 | pdftotext -layout %s - | utf8trans UTFtoASCII

Yes, Mr. Bell, luckily it has that -layout argument.
Whereas mupdf (now) comes with mutool, which can "convert", that
pdftotext from poppler with its -layout is the only PDF (and thus
PS) converter i know who does an acceptable job.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Steffen Nurpmeso
Tim Chase wrote in <20190603215813.5034d...@bigbox.christie.dr>:
 |If you have the "pdftotext" utility (part of my "poppler-utils"
 |package here on Debian), you might be able to either use it in your
 |mailcap
 |
 |  pdftotext "%s" - | less
 |
 |or create a shell-script:
 |
 |  #!/bin/sh
 |  pdftotext "$1" - | less
 |
 |and then spawn that shell-script in your mailcap file:
 |
 |  application/pdf; my_pdf_to_text.sh "%s"

And do not forget the -layout argument.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread David Woolley

On 04/06/2019 12:01, Mouse wrote:

...nor, apparently, that not everyone has Word.


Reading with Open Office (and without Microsoft fonts) is the common 
reason why the layout ends up broken.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread David Woolley

On 04/06/2019 11:57, Mouse wrote:

I don't recall enough details to know whether FlateDecode's compression
algorithm is close enough to any of the general-purpose compression


FlateDecode uses the core algorithm from gzip (and also PNG), but won't 
have the metadata.


DCTDecode uses the JPEG (discrete cosine transform) algorithm, again 
metadata will be outside the compressed stream.


There is also one for Group 4 Fax, which is what should be, but often 
isn't, used for bi-level scans of documents, and older PDFs have 
LZWDecode, which is compress.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Mouse
>> Are people using pdf emails?

> Businesses will often send an email with [...] a PDF attachment.

> Less sophisticated businesses will often do this with Microsoft Word,
> not realising that they cannot predict the layout.

...nor, apparently, that not everyone has Word.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Mouse
 Well, lynx said it may be a binary, see it anyway?  It was a mess.
>> Yes.  Most PDFs in my experience have most of their data compressed,
>> so they are "binary junk" when looked at with tools that don't
>> understand PDF structure and the compression method(s) in question.
> zless may be a better alternative since it does compressed data.

Not of much use here.  PDFs are not simply text files which have had a
general-purpose compression tool applied to them; they have internal
structure, and _some_ of the content gets compressed.

One PDF I have, for example, begins

%PDF-1.6
%âãÏÓ
5191 0 obj
<>stream

after which the "binary junk" begins.  A few KB later (3647 bytes, I
expect), I see

endstream
endobj
5192 0 obj
<>stream

and it's back to binary compressed data.

Other PDFs have more plaintext before the compressed data begins;
another one I checked has some sixty or seventy lines of plain text
before going into compressed data.

I don't recall enough details to know whether FlateDecode's compression
algorithm is close enough to any of the general-purpose compression
tools like gzip or compress to be of use, but even if it is, you would
at a minimum have to pick apart the PDF structure enough to extract the
compressed portion.  And, of course, FlateDecode is not the only
compression algorithm PDFs can use.

For full details, of course, read the PDF spec.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread David Woolley

On 04/06/2019 11:38, Mike Marchywka wrote:

Are people using pdf emails?


Businesses will often send an email with a formal business letter, with 
all the proper letterheads, as a  PDF attachment.  Invoices are often 
done this way.


Less sophisticated businesses will often do this with Microsoft Word, 
not realising that they cannot predict the layout.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] Displaying a pdf live on the Fly?

2019-06-04 Thread Mike Marchywka
On Mon, Jun 03, 2019 at 07:44:33PM -0700, Chime Hart wrote:
> Hi All: I am realizing it would be easier to have lynx display pdfs basicly
> like any other texts. Otherwise I must also run a pdf converter on the file
> or e-mail to RoboBraille. Anyway we tried modifying my dot mailcap file,
> like this
> application/pdf; less "%s"

Are people using pdf emails? I was curious about latex email because the source
code can be made more human readable than html- I was usig lynx to
try to convert the html layout info into logical latex like
syntax for testing some stuff. 
I don't really like non-text emails but for things that absolutely need
to have some structure - logical or layout- latex like source would
probably be a better way to go than anything compiled into a binary display
format. With logical structure the viewer can expand or hide various blocks
as needed. 

Right now I am just noting tags I found useful for later  parsing
into logical strucutre.  So I added things to the formated output,


./lynx -cfg=./lynx.cfg -mjm=2 -dump -force-html ifn_clips.txt  | grep "" | 
more
unknown option name EXTERNAL in ./lynx.cfg
   \br{}EC4Y 0AN, United Kingdom - Company No. 09901510 [Exact
   \p{This message contains My NCBI what's new results from the National
   (\url{http://www.ncbi.nlm.nih.gov/}NCBI) at the U.S. National Library
   of Medicine (\url{http://www.nlm.nih.gov/}NLM).
   \br{}Do not reply directly to this message.
   \p{Sender's message: Search: interferon
   \br{}Search: interferon
   \br{}
   \br{}\url{http://www.ncbi.nlm.nih.gov/myncbi/searches/1340461/1jGkfURVu
   \br{}
   \br{}\url{http://www.ncbi.nlm.nih.gov/myncbi/searches/1340461/}Edit

to aid with things like this which is solely for input into a viewer I was
trying to make, 


 cat mail_clip_file.txt |  gawk -f  pubmed.awk  | more

\citation{   2. World J Biol Psychiatry. 2019 May 13:1-22. doi: 
10.1080/15622975.2019.1618494. [Epub ahead of print]}
\title{[16]Cytokine-mediated cellular immune activation in electroconvulsive 
therapy: A CSF study in patients with
treatment-resistant depression.}
\authors{   [17]Mindt S^1, [18]Neumaier M^1, [19]Hoyer C^2, [20]Sartorius A^3, 
[21]Kranaster L^3.
   Author information:
   1. a Institute for Clinical Chemistry, University Medical Centre Mannheim, 
Faculty of Medicine Mannheim,
   University of Heidelberg , Mannheim , Germany.
   2. b Department of Neurology , University Medical Centre Mannheim , Mannheim 
, Germany.
   3. c Department of Psychiatry and Psychotherapy , Central Institute of 
Mental Health, Medical Faculty
   Mannheim/Heidelberg University , Mannheim , Germany.}
\abstract{OBJECTIVES:
   Evidence points towar 

It is interesting though that while many
test emxails can be under 1k, the headers can be 5x more than that
with all the relays and spam stuff lol. 


> Well, lynx said it may be a binary, see it anyway? It was a mess. So can
> some1 please inform an easy way of doing this, or would I need an external?
> Thanks so much in advance
> Chime
> 
> ___
> Lynx-dev mailing list
> Lynx-dev@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/lynx-dev

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchy...@hotmail.com
404-788-1216
ORCID: -0001-9237-455X

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev