Re: [GTALUG] CZUR scanners under Linux

2022-11-15 Thread Kevin Cozens via talk

On 2022-11-10 02:21, D. Hugh Redelmeier via talk wrote:

| Is there any support for it in VueScan?

I don't know VueScan.
It is similar in idea to XSane. It supports a lot of (old/obsolete) 
scanners. I can't use XSane to scan slides on my HP G4010 because it doesn't 
turn on the light in the lid. VueScan does. The downside to VueScan is that 
is more of a commercial product. There are versions you can use for free but 
it may add a watermark to the scanned images. To use it for scanning slides 
without watermarks I would need to play $149.


--
Cheers!

Kevin.

http://www.ve3syb.ca/   | "Nerds make the shiny things that
https://www.patreon.com/KevinCozens | distract the mouth-breathers, and
| that's why we're powerful"
Owner of Elecraft K2 #2172  |
#include  | --Chris Hardwick

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] CZUR scanners under Linux

2022-11-14 Thread Stewart Russell via talk
On Mon, Nov 14, 2022 at 12:25 PM Peter King via talk 
wrote:

>
> One of the ways in which OCR contributes real value is if you have a large
> number of documents that are idiosyncratic in the same way ...  If anyone
> knows of anything open-source that works reasonably well, I'd love to hear
> about it.
>

For all that Tesseract is a mass-ingestion OCR tool, it can be fine tuned.
Whether there are tools for training it that are user-friendly, I don't
know. I'd really like a tool that would stop tesseract on matches lower
than a certain confidence threshold, and allow manual control of what was
stored in the text.

 A few years ago tesseract was used to create a searchable archive of all
available documentation from the Free City of Danzig, the short-lived city
state that existed from 1920-1939 in what is now Gdańsk, Poland. Most of
the paperwork (and there was a *lot*: very big on public participation in
deciding on how they were going to be run) was printed in Fraktur (aka
blackletter, gothic or textura). Tesseract was trained to read this script,
and now the parameters live in the 'tesseract-ocr-frk' package for all to
use. I wish they could have done the same for the then-contemporary written
script of Sütterlin, one of the great "go home you're drunk" cursives.

For very automatic OCR on Linux, the ocrmypdf tool is quite amazing. Great
way of stress-testing your hardware, too.

 Stewart
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] CZUR scanners under Linux

2022-11-14 Thread Peter King via talk
I'm very grateful to this thread -- I was getting ready to order a CZUR 
scanner in the expectation that it would work reasonably well under 
Linux.  So much for those expectations.


One of the ways in which OCR contributes real value is if you have a 
large number of documents that are idiosyncratic in the same way -- then 
you can teach it how to recognize the characters.  I have lots of old 
books with such oddities, and OCR that is optimized for mass use -- 
read: business paperwork -- just doesn't cut it. If anyone knows of 
anything open-source that works reasonably well, I'd love to hear about it.


On 11/14/22 10:54, Alvin Starr via talk wrote:

On 2022-11-14 08:40, Stewart C. Russell via talk wrote:

On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:


Apparently the scan under MacOS (and probably under Windows)
has better OCR than under Linux.  Grr.


We're probably stuck with Tesseract, which — while it's much better 
than it used to be — is now optimized for mass "good enough" 
recognition of simple pages. Omnipage dropped its Linux support years 
ago, and Abbyy Finereader's Linux support is only for ($$$) 
enterprise. Adobe's now the monster of OCR, but of course it's only 
built into its rented Acrobat Pro platform.


It's a shame that Linux users don't get the nice things that come 
with hardware that we buy. The page remapping and finger editing-out 
sound very handy.

There are a number of cloud OCR solutions.

I have not tested them but I would bet they are of good quality.
Of course the trade off is that your making your data available for 
the cloud provider to monetize along with analyzing by the worlds 
various security services.


A few years ago I tested various text to speech solutions and in the 
end the only ones of quality that were not insanely expensive were the 
cloud providers.
Initially I was using the google TTS that was bundled into chrome but 
that got closed down so I ended up with the fee based service.

Still the quality was way better than anything we could buy.

My guess is that OCR will go that way.
The hardware manufacturers will bundle some white labeled cloud 
service that is somehow limited or hobbled and subject to upsell.


--
Peter King  peter.k...@utoronto.ca
Department of Philosophy
170 St. George Street #521
The University of Toronto  (416)-946-3170 ofc
Toronto, ON  M5R 2M8
   CANADA

http://individual.utoronto.ca/pking/

=
GPG keyID 0x7587EC42 (2B14 A355 46BC 2A16 D0BC  36F5 1FE6 D32A 7587 EC42)
gpg --keyserver pgp.mit.edu --recv-keys 7587EC42



OpenPGP_0x1FE6D32A7587EC42.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] CZUR scanners under Linux

2022-11-14 Thread Alvin Starr via talk

On 2022-11-14 08:40, Stewart C. Russell via talk wrote:

On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:


Apparently the scan under MacOS (and probably under Windows)
has better OCR than under Linux.  Grr.


We're probably stuck with Tesseract, which — while it's much better 
than it used to be — is now optimized for mass "good enough" 
recognition of simple pages. Omnipage dropped its Linux support years 
ago, and Abbyy Finereader's Linux support is only for ($$$) 
enterprise. Adobe's now the monster of OCR, but of course it's only 
built into its rented Acrobat Pro platform.


It's a shame that Linux users don't get the nice things that come with 
hardware that we buy. The page remapping and finger editing-out sound 
very handy.

There are a number of cloud OCR solutions.

I have not tested them but I would bet they are of good quality.
Of course the trade off is that your making your data available for the 
cloud provider to monetize along with analyzing by the worlds various 
security services.


A few years ago I tested various text to speech solutions and in the end 
the only ones of quality that were not insanely expensive were the cloud 
providers.
Initially I was using the google TTS that was bundled into chrome but 
that got closed down so I ended up with the fee based service.

Still the quality was way better than anything we could buy.

My guess is that OCR will go that way.
The hardware manufacturers will bundle some white labeled cloud service 
that is somehow limited or hobbled and subject to upsell.



--
Alvin Starr   ||   land:  (647)478-6285
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] CZUR scanners under Linux

2022-11-14 Thread Stewart C. Russell via talk

On 10/11/2022 02.21, D. Hugh Redelmeier via talk wrote:


Apparently the scan under MacOS (and probably under Windows)
has better OCR than under Linux.  Grr.


We're probably stuck with Tesseract, which — while it's much better than 
it used to be — is now optimized for mass "good enough" recognition of 
simple pages. Omnipage dropped its Linux support years ago, and Abbyy 
Finereader's Linux support is only for ($$$) enterprise. Adobe's now the 
monster of OCR, but of course it's only built into its rented Acrobat 
Pro platform.


It's a shame that Linux users don't get the nice things that come with 
hardware that we buy. The page remapping and finger editing-out sound 
very handy.


 Stewart


---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk