Re: [OT] scanned files are large in size

2019-01-04 Thread David Wright
On Fri 04 Jan 2019 at 17:26:07 (+), Brian wrote:
> On Wed 02 Jan 2019 at 22:56:22 -0500, kamaraju kusumanchi wrote:
> > On Wed, Jan 2, 2019 at 9:23 PM David Wright  
> > wrote:
> > > On Wed 02 Jan 2019 at 14:44:14 (+), Brian wrote:
> > > >
> > > > I'm intrigued; I hadn't realised that conversion of the scanned image
> > > > for some vendors' devices took place on the device itself. How do you
> > > > know this happens? It is the frontend to SANE (xsane or scanimage, for
> > > > example) which I've always associated with image aquisition conversion.
> > >
> > > It really is rather easy. You insert a USB stick into the scanner,
> > > press scan, and later observe that a JPEG or PDF file has appeared
> > > on the stick, as appropriate.
> > 
> > Yes, that is precisely what I did. Stick a USB into the scanner and
> > press the scan button.
> 
> My HP Envy 4520 has no such button. There is an option for scanning to
> the computer, but software is required on the computer to do that and
> HPLIP does not provide it.
> 
> Anyway, I managed to persuade the device to give me the PDF it would
> have sent to a USB stick if the facility had existed (the device has
> Apple's AirScan). If it matters, the PDF does not have any Creator or
> Publisher information and doesn't contain any embedded or subset fonts.

It sounds as if this is sufficient to make you confident that the
device is doing the conversion and not the computer: anything that
decouples the two from privately passing information to one another
outside the delivered file. A USB stick, or email, is just the most
obvious.

> Scanned at a resolution of 600:
> 
> brian@desktop:~$ pdfimages -list out.pdf
> page   num  type   width height color comp bpc  enc interp  object ID x-ppi 
> y-ppi size ratio
> 
>1 0 image5100  6600  gray1   8  jpeg   no 1  0   600   
> 600 2090K 6.4%
> 
> ps2pdf reduces the 2090K by about 50% to 1051K.
> 
> A different scanner device and source document, of course, and maybe
> different methods of PDF production, so I wouldn't read too much into
> this.

Proving whether any compression applied is lossless is more difficult
because pdfimages seems mute on what processes were carried out in
extracting an image from the PDF. I have made the assumption that
scanning compressed means that lossy compression is applied whereas
scanning "uncompressed" means that lossless compression is applied.

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-04 Thread Brian
On Fri 04 Jan 2019 at 20:35:47 +0100, deloptes wrote:

> Brian wrote:
> 
> > and doesn't contain any embedded or subset fonts
> 
> not heard that such are required for a jpeg or whatever image format
> embedded in pdf file

I reported. I am not pursuing it further. Neither are you, I think.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-04 Thread Brian
On Fri 04 Jan 2019 at 13:41:50 -0500, Gene Heskett wrote:

> On Friday 04 January 2019 12:26:07 Brian wrote:
> 
> > On Wed 02 Jan 2019 at 22:56:22 -0500, kamaraju kusumanchi wrote:
> > > On Wed, Jan 2, 2019 at 9:23 PM David Wright 
>  wrote:
> > > > On Wed 02 Jan 2019 at 14:44:14 (+), Brian wrote:
> > > > > I'm intrigued; I hadn't realised that conversion of the scanned
> > > > > image for some vendors' devices took place on the device itself.
> > > > > How do you know this happens? It is the frontend to SANE (xsane
> > > > > or scanimage, for example) which I've always associated with
> > > > > image aquisition conversion.
> > > >
> > > > It really is rather easy. You insert a USB stick into the scanner,
> > > > press scan, and later observe that a JPEG or PDF file has appeared
> > > > on the stick, as appropriate.
> > >
> > > Yes, that is precisely what I did. Stick a USB into the scanner and
> > > press the scan button.
> >
> > My HP Envy 4520 has no such button. There is an option for scanning to
> > the computer, but software is required on the computer to do that and
> > HPLIP does not provide it.
> >
> > Anyway, I managed to persuade the device to give me the PDF it would
> > have sent to a USB stick if the facility had existed (the device has
> > Apple's AirScan). If it matters, the PDF does not have any Creator or
> > Publisher information and doesn't contain any embedded or subset
> > fonts.
> >
> > Scanned at a resolution of 600:
> >
> > brian@desktop:~$ pdfimages -list out.pdf
> > page   num  type   width height color comp bpc  enc interp  object ID
> > x-ppi y-ppi size ratio
> > --
> >-- 1 0 image5100  6600  gray1   8  jpeg
> >   no 1  0   600   600 2090K 6.4%
> >
> > ps2pdf reduces the 2090K by about 50% to 1051K.
> >
> > A different scanner device and source document, of course, and maybe
> > different methods of PDF production, so I wouldn't read too much into
> > this.
> >
> > BTW (for completeness), what machine was scanned_in_office.pdf
> > produced on?
> 
> If I take a screen snapshot that might be of interest to my bunch, I 
> usually run it thru gimp, exporting it as a jpeg, increasing the 
> compression until I start to see artifacts/errors in the preview image, 
> then go back up in size till I can't see them anymore, then export to a 
> more understandable english name.  By this method I have pulled in an 
> image from my camera that was a gigabyte+ when unpacked from its "jpeg" 
> output, and smunched it down to 2 or 3 hundred kilobytes for sending 
> over the net. And I'm still sending a far higher quality of image than 
> I've ever received from a winders machine sending me 25k jpegs. The 
> proper description of those when being kind is fugly.

Perhaps I am missing something, but this appears to have nothing to
do with my post. If you missed the essentialness in this thread, it
is about scanning, not screen snapshots and cameras.

Please try to keep up.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-04 Thread deloptes
Brian wrote:

> and doesn't contain any embedded or subset fonts

not heard that such are required for a jpeg or whatever image format
embedded in pdf file



Re: [OT] scanned files are large in size

2019-01-04 Thread Gene Heskett
On Friday 04 January 2019 12:26:07 Brian wrote:

> On Wed 02 Jan 2019 at 22:56:22 -0500, kamaraju kusumanchi wrote:
> > On Wed, Jan 2, 2019 at 9:23 PM David Wright 
 wrote:
> > > On Wed 02 Jan 2019 at 14:44:14 (+), Brian wrote:
> > > > I'm intrigued; I hadn't realised that conversion of the scanned
> > > > image for some vendors' devices took place on the device itself.
> > > > How do you know this happens? It is the frontend to SANE (xsane
> > > > or scanimage, for example) which I've always associated with
> > > > image aquisition conversion.
> > >
> > > It really is rather easy. You insert a USB stick into the scanner,
> > > press scan, and later observe that a JPEG or PDF file has appeared
> > > on the stick, as appropriate.
> >
> > Yes, that is precisely what I did. Stick a USB into the scanner and
> > press the scan button.
>
> My HP Envy 4520 has no such button. There is an option for scanning to
> the computer, but software is required on the computer to do that and
> HPLIP does not provide it.
>
> Anyway, I managed to persuade the device to give me the PDF it would
> have sent to a USB stick if the facility had existed (the device has
> Apple's AirScan). If it matters, the PDF does not have any Creator or
> Publisher information and doesn't contain any embedded or subset
> fonts.
>
> Scanned at a resolution of 600:
>
> brian@desktop:~$ pdfimages -list out.pdf
> page   num  type   width height color comp bpc  enc interp  object ID
> x-ppi y-ppi size ratio
> --
>-- 1 0 image5100  6600  gray1   8  jpeg
>   no 1  0   600   600 2090K 6.4%
>
> ps2pdf reduces the 2090K by about 50% to 1051K.
>
> A different scanner device and source document, of course, and maybe
> different methods of PDF production, so I wouldn't read too much into
> this.
>
> BTW (for completeness), what machine was scanned_in_office.pdf
> produced on?

If I take a screen snapshot that might be of interest to my bunch, I 
usually run it thru gimp, exporting it as a jpeg, increasing the 
compression until I start to see artifacts/errors in the preview image, 
then go back up in size till I can't see them anymore, then export to a 
more understandable english name.  By this method I have pulled in an 
image from my camera that was a gigabyte+ when unpacked from its "jpeg" 
output, and smunched it down to 2 or 3 hundred kilobytes for sending 
over the net. And I'm still sending a far higher quality of image than 
I've ever received from a winders machine sending me 25k jpegs. The 
proper description of those when being kind is fugly.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 



Re: [OT] scanned files are large in size

2019-01-04 Thread Brian
On Wed 02 Jan 2019 at 22:56:22 -0500, kamaraju kusumanchi wrote:

> On Wed, Jan 2, 2019 at 9:23 PM David Wright  wrote:
> >
> > On Wed 02 Jan 2019 at 14:44:14 (+), Brian wrote:
> > >
> > > I'm intrigued; I hadn't realised that conversion of the scanned image
> > > for some vendors' devices took place on the device itself. How do you
> > > know this happens? It is the frontend to SANE (xsane or scanimage, for
> > > example) which I've always associated with image aquisition conversion.
> >
> > It really is rather easy. You insert a USB stick into the scanner,
> > press scan, and later observe that a JPEG or PDF file has appeared
> > on the stick, as appropriate.
> >
> 
> Yes, that is precisely what I did. Stick a USB into the scanner and
> press the scan button.

My HP Envy 4520 has no such button. There is an option for scanning to
the computer, but software is required on the computer to do that and
HPLIP does not provide it.

Anyway, I managed to persuade the device to give me the PDF it would
have sent to a USB stick if the facility had existed (the device has
Apple's AirScan). If it matters, the PDF does not have any Creator or
Publisher information and doesn't contain any embedded or subset fonts.

Scanned at a resolution of 600:

brian@desktop:~$ pdfimages -list out.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi 
y-ppi size ratio

   1 0 image5100  6600  gray1   8  jpeg   no 1  0   600   
600 2090K 6.4%

ps2pdf reduces the 2090K by about 50% to 1051K.

A different scanner device and source document, of course, and maybe
different methods of PDF production, so I wouldn't read too much into
this.

BTW (for completeness), what machine was scanned_in_office.pdf produced
on?

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-04 Thread Jonathan Dowland

On Thu, Jan 03, 2019 at 02:42:10PM -0600, David Wright wrote:

But what a disappointment that you didn't get simple tags and
raw data.


Yes, indeed it seems these particular printers will *always* give you
JPEG, what changes is the wrapper.

--

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Jonathan Dowland
⢿⡄⠘⠷⠚⠋⠀ https://jmtd.net
⠈⠳⣄ Please do not CC me, I am subscribed to the list.



Re: [OT] scanned files are large in size

2019-01-03 Thread David Wright
On Thu 03 Jan 2019 at 13:43:40 (+), Jonathan Dowland wrote:
> I'm replying to the top-level of this thread because it's not a direct
> reply to any particular message, but the thread reminded me of
> something.
> 
> I occasionally scan large piles of paperwork using an MFP belonging to a
> local University. It emails me the results and has several options for
> the format and quality.
> 
> What I wanted was lossless files, so I selected TIFF instead of JPEG or
> PDF. But I later discovered that modern TIFF is a versatile container
> format, and the printer was sending me JPEG-in-TIFF.

I can understand the mistake. TIFF was a godsend discovery for me when
I had raw image data that I wanted read by "conventional software" as
it's very easy to package it up. And the tagging system means that you
can choose what to read in the file without getting indigestion.
But then you realise that the more modern tags let you package all
sorts of strange things in the file and if you want to read those
sections, you've got to do all the dirty work of unpacking it.
(I neven did more complex than palette colours and runlength encoding.)

But what a disappointment that you didn't get simple tags and
raw data.

BTW a lot of people make a similar mistake with WAV files, assuming
that a .wav extension specifies certain properties of the file contents.

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-03 Thread David Wright
On Thu 03 Jan 2019 at 14:07:15 (+0100), Siard wrote:
> David Wright wrote:
> > So I can't understand your objection to wrapping a scanned image into
> > a PDF container, which makes a lot of data handling a lot easier than
> > would otherwise be the case.
> 
> After scanning, an image almost always needs editing. Crop, rotate to
> correct a skew horizon, remove specks, adjust light and contrast.

In that case it sounds as if selecting PDF would be the wrong format
for you to save in. I hope your scanner has a more appropriate choice
available.

> Gimp can open a pdf, but not in its original resolution, so there is
> loss of quality.  Pdfimages can extract the image first, but its
> original format (tiff? jpg? pnm?) remains unclear then, so there is a
> conversion, again causing loss of quality (AFAIU).

Not knowing your model of scanner, I can make no comment. People
presumably investigate how to obtain the highest quality scan from
whichever they buy, and in a format that is appropriate for them.

Here, if I were working directly on the bits of raw image, I would
choose PDF colour uncompressed 600dpi from which pdfimages yields
PPMs (type P6), which are easy to handle, unlike the lossy JPEG.

> > Other examples would be postprocessing with programs like pdftk and
> > pdfjam.
> 
> Those programs cannot edit images.

No, they're really for working at the level of pages. But some of the
things they do can be considered as "editing", like scaling, masking,
watermarking, straightening up (though one could be forgiven for
failing to find that option). These are the sorts of things that
commercial office workers might expect to do. (I haven't bothered
to mention collation, 90° rotations, and so on.) This is likely a much
bigger target for marketing all-in-one devices as well as cheaper
scanners.

I'm guessing that serious image manipulators buy much more versatile
up-market scanners, just as pro digital photographers expect their
cameras to be able to output raw image data. Some of the things they
do could be considered fraudulent in an office environment!

> > An obvious example was already mentioned: put a document into the
> > ADF, press the button, obtain one file containing the entire
> > document. [...] Would you really send a scanned document to a
> > company/institution as a multitude of image attachments instead of
> > a single PDF?
> 
> That should be the final stage of the process, not the beginning!
> You can use img2pdf to put the images in a pdf container, without
> affecting the image quality.

That would be a disaster for office productivity.

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-03 Thread Nicolas George
deloptes (2019-01-03):
> If it is a document, why should I open it in Gimp?

The level of "my use case is the only use case" in this subthread is
frightening and staggering.

-- 
  Nicolas George


signature.asc
Description: PGP signature


Re: [OT] scanned files are large in size

2019-01-03 Thread deloptes
Siard wrote:

> Very different here. I scan from within Gimp:
> File > Create > XSane > Device dialog...
> Then the image scanned with XSane opens directly in Gimp.

If it is a document, why should I open it in Gimp?
The use cases for documents are either you save it somewhere (archive) or
you mail it to someone. Therefore you have usually also a mail button on
the scanner.

regards



Re: [OT] scanned files are large in size

2019-01-03 Thread Siard
deloptes wrote:
> Siard wrote:
> > After scanning, an image almost always needs editing. Crop, rotate
> > to correct a skew horizon, remove specks, adjust light and contrast.
> 
> Don't know about you, but I usually press the button and get the
> image. Sometimes I use PDF to scan multiple pages into one document,
> sometimes I scan in PNG or JPG and create a PDF out of them.
> 
> As a user I do not want to spend time for correction. Put the image
> in the scanner the way you want it to be scanned, select the options
> on software side and just press a button.

Very different here. I scan from within Gimp:
File > Create > XSane > Device dialog...
Then the image scanned with XSane opens directly in Gimp.



Re: [OT] scanned files are large in size

2019-01-03 Thread deloptes
Siard wrote:

> After scanning, an image almost always needs editing. Crop, rotate to
> correct a skew horizon, remove specks, adjust light and contrast.
> Gimp can open a pdf, but not in its original resolution, so there is
> loss of quality.  Pdfimages can extract the image first, but its
> original format (tiff? jpg? pnm?) remains unclear then, so there is a
> conversion, again causing loss of quality (AFAIU).

Don't know about you, but I usually press the button and get the image.
Sometimes I use PDF to scan multiple pages into one document, sometimes I
scan in PNG or JPG and create a PDF out of them.

As a user I do not want to spend time for correction. Put the image in the
scanner the way you want it to be scanned, select the options on software
side and just press a button.





Re: [OT] scanned files are large in size

2019-01-03 Thread Jonathan Dowland

I'm replying to the top-level of this thread because it's not a direct
reply to any particular message, but the thread reminded me of
something.

I occasionally scan large piles of paperwork using an MFP belonging to a
local University. It emails me the results and has several options for
the format and quality.

What I wanted was lossless files, so I selected TIFF instead of JPEG or
PDF. But I later discovered that modern TIFF is a versatile container
format, and the printer was sending me JPEG-in-TIFF.

--

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Jonathan Dowland
⢿⡄⠘⠷⠚⠋⠀ https://jmtd.net
⠈⠳⣄ Please do not CC me, I am subscribed to the list.



Re: [OT] scanned files are large in size

2019-01-03 Thread Nicolas George
Siard (2019-01-03):
> After scanning, an image almost always needs editing. Crop, rotate to
> correct a skew horizon, remove specks, adjust light and contrast.

That depends on the purpose.

>   Pdfimages can extract the image first, but its
> original format (tiff? jpg? pnm?) remains unclear then, so there is a
> conversion, again causing loss of quality (AFAIU).

You are mistaken. If the image is stored losslessly in the PDF, then
extracting it to any lossless format will not lose any quality. If the
image is stored with a lossy codec, pdfimages can be directed (RTFM) to
re-wrap the codec data in an adequate container, causing no extra loss
of quality either.

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature


Re: [OT] scanned files are large in size

2019-01-03 Thread Siard
David Wright wrote:
> So I can't understand your objection to wrapping a scanned image into
> a PDF container, which makes a lot of data handling a lot easier than
> would otherwise be the case.

After scanning, an image almost always needs editing. Crop, rotate to
correct a skew horizon, remove specks, adjust light and contrast.
Gimp can open a pdf, but not in its original resolution, so there is
loss of quality.  Pdfimages can extract the image first, but its
original format (tiff? jpg? pnm?) remains unclear then, so there is a
conversion, again causing loss of quality (AFAIU).

> Other examples would be postprocessing with programs like pdftk and
> pdfjam.

Those programs cannot edit images.

> An obvious example was already mentioned: put a document into the
> ADF, press the button, obtain one file containing the entire
> document. [...] Would you really send a scanned document to a
> company/institution as a multitude of image attachments instead of
> a single PDF?

That should be the final stage of the process, not the beginning!
You can use img2pdf to put the images in a pdf container, without
affecting the image quality.



Re: [OT] scanned files are large in size

2019-01-03 Thread Jörg-Volker Peetz
Maybe you could then try some of the switches for ps2pdf, for example

$ ps2pdf -dPDFSETTINGS=/printer  old.pdf  new.pdf

"/printer" makes it 300dpi, "/ebook" 150 dpi, and "/screen" 72 dpi, the
documentation can tell you more.

Regards,
Jörg.



Re: [OT] scanned files are large in size

2019-01-02 Thread David Wright
On Wed 02 Jan 2019 at 22:50:10 (-0500), kamaraju kusumanchi wrote:
> On Wed, Jan 2, 2019 at 6:33 AM Jörg-Volker Peetz  wrote:
> >
> > With the pdf-files from my Canon scanner, I did shrink them with the help of
> > ghostscript:
> >
> > $ ps2pdf  old.pdf  new.pdf
> >
> 
> This does not help. The file sizes are more or less the same (if
> anything, they are slightly larger).

That's my usual experience too. However, I looked around for an old
PDF and found a magazine distributed by my old employer. I could halve
the file size as above. Looking at the original, there's a lot more
legible XML (like image metadata) though it's difficult to tell if
that accounts for the difference. I think it was produced on a Mac;
most of the images certainly were.

$ pdfinfo /tmp/original.pdf
Creator:Adobe InDesign CS3 (5.0.2)
Producer:   Adobe PDF Library 8.0
CreationDate:   Thu May  1 05:47:22 2008 CDT
ModDate:Thu May  1 06:21:43 2008 CDT
Tagged: no
UserProperties: no
Suspects:   no
Form:   AcroForm
JavaScript: no
Pages:  48
Encrypted:  no
Page size:  595.276 x 841.89 pts (A4)
Page rot:   0
File size:  7297299 bytes
Optimized:  no
PDF version:1.6
$ pdfinfo /tmp/new.pdf
Creator:Adobe InDesign CS3 (5.0.2)
Producer:   GPL Ghostscript 9.26
CreationDate:   Wed Jan  2 22:21:22 2019 CST
ModDate:Wed Jan  2 22:21:22 2019 CST
Tagged: no
UserProperties: no
Suspects:   no
Form:   none
JavaScript: no
Pages:  48
Encrypted:  no
Page size:  595.276 x 841.89 pts (A4)
Page rot:   0
File size:  3769525 bytes
Optimized:  no
PDF version:1.4
$ 

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-02 Thread kamaraju kusumanchi
On Wed, Jan 2, 2019 at 9:23 PM David Wright  wrote:
>
> On Wed 02 Jan 2019 at 14:44:14 (+), Brian wrote:
> >
> > I'm intrigued; I hadn't realised that conversion of the scanned image
> > for some vendors' devices took place on the device itself. How do you
> > know this happens? It is the frontend to SANE (xsane or scanimage, for
> > example) which I've always associated with image aquisition conversion.
>
> It really is rather easy. You insert a USB stick into the scanner,
> press scan, and later observe that a JPEG or PDF file has appeared
> on the stick, as appropriate.
>

Yes, that is precisely what I did. Stick a USB into the scanner and
press the scan button.

-- 
Kamaraju S Kusumanchi | http://raju.shoutwiki.com/wiki/Blog



Re: [OT] scanned files are large in size

2019-01-02 Thread kamaraju kusumanchi
On Wed, Jan 2, 2019 at 6:33 AM Jörg-Volker Peetz  wrote:
>
> With the pdf-files from my Canon scanner, I did shrink them with the help of
> ghostscript:
>
> $ ps2pdf  old.pdf  new.pdf
>

This does not help. The file sizes are more or less the same (if
anything, they are slightly larger).

Original files:
% ls -al scanned_in_office.pdf scanned_on_mx870.pdf
-rw-r--r-- 1 rajulocal rajulocal  331796 Jan  1 11:54 scanned_in_office.pdf
-rw-r--r-- 1 rajulocal rajulocal 1775460 Jan  1 11:48 scanned_on_mx870.pdf

Conversion:
% ps2pdf scanned_in_office.pdf file1.pdf
% ps2pdf scanned_on_mx870.pdf file2.pdf

New file sizes:
% ls -al file1.pdf file2.pdf
-rw-r--r-- 1 rajulocal rajulocal  338539 Jan  2 22:48 file1.pdf
-rw-r--r-- 1 rajulocal rajulocal 1775470 Jan  2 22:48 file2.pdf

-- 
Kamaraju S Kusumanchi | http://raju.shoutwiki.com/wiki/Blog



Re: [OT] scanned files are large in size

2019-01-02 Thread David Wright
On Wed 02 Jan 2019 at 16:54:37 (+), Joe wrote:
> On Wed, 2 Jan 2019 15:51:47 +0100  wrote:
> > Some scanners mail things around, these days. I don't want to even
> > think about how many security holes lurk in there.
> 
> The practical alternatives are to hook the scanner into the SMB sharing
> system, or use sneakernet. Which of those has fewer potential security
> issues? 

Well that depends on what you mean by security. I assume from what you
write below that you're perhaps thinking of the person who scans
sensitive documents onto a USB stick and carries it out of the building.

In my case, the security vulnerability is the sneakernet one: I use a
stick to transfer the scans from the scanner to an encrypted laptop.
A burglar could steal the unsecured stick as part of their swag.

> A company I do some work for is no longer self-sufficient in IT, it is
> part of the company international IT Borg. As such, its PCs are all
> hooked by VPN into a Windows domain, which rules out SMB networking with
> anything non-Borg, and all the USB sockets are disabled. Not, oddly,
> the SD card slots...
> 
> I've just eliminated an old PC there, which existed solely as a SMTP
> server for the scanner. It now uses a Raspberry Pi running (straying
> back on-topic) Raspbian.

My scanner is either one floor up or one down from where I normally
work (attic office, basement kitchen). Wifi scanning directly into the
computer would be a pointless exercise because the documents still
have to be physically loaded onto the scanner bed.

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-02 Thread David Wright
On Wed 02 Jan 2019 at 19:17:00 (+), mick crane wrote:
> On 2019-01-02 10:24, to...@tuxteam.de wrote:
> > On Wed, Jan 02, 2019 at 09:40:33AM +, Joe wrote:
> > > On Wed, 2 Jan 2019 09:59:48 +0100 to...@tuxteam.de wrote:
> > > > And next time, try to find a scanner which provides you with a raw
> > > > image. Wrapping images in PDFs is... not elegant.
> > > 
> > > They do this to cater for multiple pages, whereas in my experience,
> > > most scanning is single-sheet. Even the Simple Scan program on Debian
> > > defaults to pdf, something which cannot be configured.
> > 
> > I get this, and offering that option seems to make sense. But forcing
> > it (and forcing an image format like JPEG) doesn't make sense. So
> > either
> > provide the knobs or let the host software do it.
> > 
> > My scanner just transfers the raw image. The scan program is
> > responsible
> > for the transformation to the target format, which I can choose.
> > This is
> > how /I/ want to be treated, as a paying customer.
> 
> having a scanner do PDFs is weird, see Obama birth certificate, how do
> you know is a faithful copy ?
> A piece of paper with marks on it is an image and should be treated as
> such.

If I could be bothered to look, I might be able to come across a raw
sound or video file on this Debian system. It's quite normal to wrap
such raw data in a container format of some sort.

So I can't understand your objection to wrapping a scanned image into
a PDF container, which makes a lot of data handling a lot easier than
would otherwise be the case. An obvious example was already mentioned:
put a document into the ADF, press the button, obtain one file
containing the entire document. Other examples would be postprocessing
with programs like pdftk and pdfjam. Would you really send a scanned
document to a company/institution as a multitude of image attachments
instead of a single PDF?

If you want the image back from a PDF, that's what the pdfimages
program is for. I would assume that tomás is pleased with the
packaging of the raw scan data into an image format of some
description, so what's the difference?

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-02 Thread David Wright
On Wed 02 Jan 2019 at 14:44:14 (+), Brian wrote:
> On Tue 01 Jan 2019 at 22:41:06 -0500, kamaraju kusumanchi wrote:
> 
> > On Tue, Jan 1, 2019 at 1:40 PM  wrote:
> > >
> > > Yep. The one image is encoded as CCITT (aka Group 4, aka fax [1]), which 
> > > is
> > > passable for low res B&W images, but not that much for hi-res or color (or
> > > gray scale). It compresses much worse than the other which is JPEG, which 
> > > is
> > > expressly made for hi-res and color (or grayscale) images.
> > >
> > > OTOH, CCITT is lossless and JPEG lossy ;-)
> > >
> > ok, thanks.
> > 
> > > > Questions:
> > > > 1) Does the large file size have anything to do with the printer
> > > > itself? Is there anything I can do (ex:- update the driver/firmware or
> > > > something)?
> > >
> > > That depends on what is encoding the images: does the scanner itself
> > > "make" the PDF? Or some software, computer-side?
> > >
> > 
> > The scanner itself makes the pdf files.
> 
> I'm intrigued; I hadn't realised that conversion of the scanned image
> for some vendors' devices took place on the device itself. How do you
> know this happens? It is the frontend to SANE (xsane or scanimage, for
> example) which I've always associated with image aquisition conversion.

It really is rather easy. You insert a USB stick into the scanner,
press scan, and later observe that a JPEG or PDF file has appeared
on the stick, as appropriate.

Cheers,
David.



Re: [OT] scanned files are large in size

2019-01-02 Thread Brian
On Wed 02 Jan 2019 at 19:17:00 +, mick crane wrote:

> On 2019-01-02 10:24, to...@tuxteam.de wrote:
> > On Wed, Jan 02, 2019 at 09:40:33AM +, Joe wrote:
> > > On Wed, 2 Jan 2019 09:59:48 +0100
> > > to...@tuxteam.de wrote:
> > > 
> > > 
> > > >
> > > > And next time, try to find a scanner which provides you with a raw
> > > > image. Wrapping images in PDFs is... not elegant.
> > > 
> > > They do this to cater for multiple pages, whereas in my experience,
> > > most scanning is single-sheet. Even the Simple Scan program on Debian
> > > defaults to pdf, something which cannot be configured.
> > 
> > I get this, and offering that option seems to make sense. But forcing
> > it (and forcing an image format like JPEG) doesn't make sense. So either
> > provide the knobs or let the host software do it.
> > 
> > My scanner just transfers the raw image. The scan program is responsible
> > for the transformation to the target format, which I can choose. This is
> > how /I/ want to be treated, as a paying customer.
> > 
> 
> having a scanner do PDFs is weird, see Obama birth certificate, how do you
> know is a faithful copy ?

Eh?

> A piece of paper with marks on it is an image and should be treated as such.

Egad. I wish I had thought of that.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-02 Thread Brian
On Tue 01 Jan 2019 at 22:51:02 -0500, kamaraju kusumanchi wrote:

> On Tue, Jan 1, 2019 at 3:04 PM Brian  wrote:
> >
> > On Tue 01 Jan 2019 at 12:34:38 -0500, kamaraju kusumanchi wrote:
> >
> > > A scanned document from Canon pixma mx870 printer is significantly
> > > larger compared to the same document scanned on a different scanner.
> >
> > Which is...?
> 
> Do not have this information at the moment. Will provide it tomorrow.
> 
> > > When I look at both the images side by side on a PC, there is no
> > > visual difference between the two. I am trying to understand the
> > > underlying cause and fix it if possible.
> >
> > You could mention which scanning software you used and what the
> > setting for the output file format was.
> 
> Both images are obtained from the scanners directly. I did not use any
> specific software per se. The only setting I had to choose was the dpi
> - which in both cases is set to 600.

Ah, I think I see now. You used a button on the device to initiate a
scan. I was thinking in terms of something like xsane being used.

> > > Questions:
> > > 1) Does the large file size have anything to do with the printer
> > > itself? Is there anything I can do (ex:- update the driver/firmware or
> > > something)?
> >
> > Not at all; the printer has nothing to do with it. Printing is printing.
> > Scanning is scanning.
> >
> 
> Understood. This is an 'all in one' printer which has both printing
> and scanning capabilities.
> 
> > > 2) Is the difference in image sizes due to the bpc (1 vs. 8) or
> > > encoding (ccitt vs jped) fields?
> >
> > Could be.
> >
> > > 3) If yes, how to change them?
> >
> > One file is in (I think) tiff format. The other isn't. You didn't scan
> > like and like from both devices.
> 
> There are not that many options to choose from the scan settings. You
> just choose the dpi and that is about it.
> 
> I understand that we can't change much on what the scanner produces.
> But there should be some software to further change the scanner's
> output files?

Jörg-Volker Peetz has indicated a technique; it can work. For a smaller
file size you could also reduce the resolution from 600.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-02 Thread mick crane

On 2019-01-02 10:24, to...@tuxteam.de wrote:

On Wed, Jan 02, 2019 at 09:40:33AM +, Joe wrote:

On Wed, 2 Jan 2019 09:59:48 +0100
to...@tuxteam.de wrote:


>
> And next time, try to find a scanner which provides you with a raw
> image. Wrapping images in PDFs is... not elegant.

They do this to cater for multiple pages, whereas in my experience,
most scanning is single-sheet. Even the Simple Scan program on Debian
defaults to pdf, something which cannot be configured.


I get this, and offering that option seems to make sense. But forcing
it (and forcing an image format like JPEG) doesn't make sense. So 
either

provide the knobs or let the host software do it.

My scanner just transfers the raw image. The scan program is 
responsible
for the transformation to the target format, which I can choose. This 
is

how /I/ want to be treated, as a paying customer.



having a scanner do PDFs is weird, see Obama birth certificate, how do 
you know is a faithful copy ?
A piece of paper with marks on it is an image and should be treated as 
such.


mick



--
Key ID4BFEBB31



Re: [OT] scanned files are large in size

2019-01-02 Thread Brian
On Wed 02 Jan 2019 at 16:14:10 +0100, deloptes wrote:

> to...@tuxteam.de wrote:
> 
> > Some scanners mail things around, these days. I don't want to even think
> > about how many security holes lurk in there.
> > 
> > There's no limit to the amount of stupid^H^H^H^H^H^Hnonsense vendors are
> > capable of when enough computing power is put in their hands.
> 
> When user is asking (and paying) for it, you just deliver (the bare minimum
> to satisfy the need). In the past 5-10 years there was a big change in
> printing and scanning in all the companies I've been with. Now you have
> the "follow me" option and you can access your print jobs on each printer,
> you can scan, mail or fax from each printer (as it has usually scanner on
> top - this multifunction crap). From hardware and software perspective it
> might be a disaster ... but who cares if the crap works.

You are correct, the last 5-10 years has seen a change in printing and
scanning technology. It is all for the better and Debian is up there
with the advancements. Doesn't this give you a warm and fuzzy New Year's
feeling of satisfaction?

Millions of people are using multifunctions successfully for their
everyday needs and with little complaint. No, that's an underestimate!
10s and 100s of millions.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-02 Thread Joe
On Wed, 2 Jan 2019 15:51:47 +0100
 wrote:

>
> 
> Some scanners mail things around, these days. I don't want to even
> think about how many security holes lurk in there.

The practical alternatives are to hook the scanner into the SMB sharing
system, or use sneakernet. Which of those has fewer potential security
issues? 

A company I do some work for is no longer self-sufficient in IT, it is
part of the company international IT Borg. As such, its PCs are all
hooked by VPN into a Windows domain, which rules out SMB networking with
anything non-Borg, and all the USB sockets are disabled. Not, oddly,
the SD card slots...

I've just eliminated an old PC there, which existed solely as a SMTP
server for the scanner. It now uses a Raspberry Pi running (straying
back on-topic) Raspbian.

-- 
Joe



Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Wed, Jan 02, 2019 at 05:18:19PM +0100, deloptes wrote:
> to...@tuxteam.de wrote:
> 
> > I used to believe that, too. But nowadays I think users can be (and get!)
> > nudged into asking for whatever vendors want them to want.

[...]

> I think companies are triggering the features as companies pay for support
> and IMO through support most money is made.

Companies have also another "feature" (at least bigger ones): those making
the buying decision aren't those having to use the product. I guess there
are hordes of salespeople trained just on this feature.

> But the schemes are also there
> for the users. Why would you have a business product line and a personal
> one. The personal is usually less expensive and if you are not a company
> you may not purchase business product?

Yes, of course.

Cheers
-- t


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-02 Thread deloptes
to...@tuxteam.de wrote:

> I used to believe that, too. But nowadays I think users can be (and get!)
> nudged into asking for whatever vendors want them to want.
> 

Lets not talk about the users, because what I am seeing in the minds of the
millenia is fearing. So I do not expect it to get better (unfortunately).

> It's like smoking: the ideal thing to sell, because people aren't getting
> what they look for (freedom, adventure) but just a stick which quickly
> burns away. They *have* to return for more -- the ideal merchandise, if
> you ask me. Then, it reportedly damages the user's health. Yet vendors
> have always managed to convince their users to buy and smoke that stuff.
> 

The addiction to nicotine is something I miss in regards to scanners and
printers.

> Why shouldn't that work with scanners, or software, or security
> "products", or DRM schemes?

I think companies are triggering the features as companies pay for support
and IMO through support most money is made. But the schemes are also there
for the users. Why would you have a business product line and a personal
one. The personal is usually less expensive and if you are not a company
you may not purchase business product?



Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Wed, Jan 02, 2019 at 04:14:10PM +0100, deloptes wrote:
> to...@tuxteam.de wrote:
> 
> > Some scanners mail things around, these days. I don't want to even think
> > about how many security holes lurk in there.
> > 
> > There's no limit to the amount of stupid^H^H^H^H^H^Hnonsense vendors are
> > capable of when enough computing power is put in their hands.
> 
> When user is asking (and paying) for it, you just deliver (the bare minimum
> to satisfy the need) [...]

I used to believe that, too. But nowadays I think users can be (and get!)
nudged into asking for whatever vendors want them to want.

It's like smoking: the ideal thing to sell, because people aren't getting
what they look for (freedom, adventure) but just a stick which quickly
burns away. They *have* to return for more -- the ideal merchandise, if
you ask me. Then, it reportedly damages the user's health. Yet vendors
have always managed to convince their users to buy and smoke that stuff.

Why shouldn't that work with scanners, or software, or security "products",
or DRM schemes?

Cheers
-- tomás


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-02 Thread deloptes
to...@tuxteam.de wrote:

> Some scanners mail things around, these days. I don't want to even think
> about how many security holes lurk in there.
> 
> There's no limit to the amount of stupid^H^H^H^H^H^Hnonsense vendors are
> capable of when enough computing power is put in their hands.

When user is asking (and paying) for it, you just deliver (the bare minimum
to satisfy the need). In the past 5-10 years there was a big change in
printing and scanning in all the companies I've been with. Now you have
the "follow me" option and you can access your print jobs on each printer,
you can scan, mail or fax from each printer (as it has usually scanner on
top - this multifunction crap). From hardware and software perspective it
might be a disaster ... but who cares if the crap works.

regards



Re: [OT] scanned files are large in size

2019-01-02 Thread Nicolas George
to...@tuxteam.de (2019-01-02):
> Some scanners mail things around, these days. I don't want to even think
> about how many security holes lurk in there.
> 
> There's no limit to the amount of stupid^H^H^H^H^H^Hnonsense vendors are
> capable of when enough computing power is put in their hands.

Some scanners/copiers use a lossy compression algorithm based on
small repeated 2D patterns that can lead to substituting a digit for
another.

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning?

Regards,

-- 
  Nicolas George



Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Wed, Jan 02, 2019 at 03:56:27PM +0100, Nicolas George wrote:

[...]

> http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning?

=:-o

I just skimmed that but... it's exquisite, for a very strange value
of "exquisite".

Thanks
-- tomás


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Wed, Jan 02, 2019 at 02:44:14PM +, Brian wrote:
> On Tue 01 Jan 2019 at 22:41:06 -0500, kamaraju kusumanchi wrote:

[...]

> > The scanner itself makes the pdf files.
> 
> I'm intrigued; I hadn't realised that conversion of the scanned image
> for some vendors' devices took place on the device itself. How do you
> know this happens? It is the frontend to SANE (xsane or scanimage, for
> example) which I've always associated with image aquisition conversion.

Some scanners mail things around, these days. I don't want to even think
about how many security holes lurk in there.

There's no limit to the amount of stupid^H^H^H^H^H^Hnonsense vendors are
capable of when enough computing power is put in their hands.

Cheers
-- tomás


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-02 Thread Brian
On Tue 01 Jan 2019 at 22:41:06 -0500, kamaraju kusumanchi wrote:

> On Tue, Jan 1, 2019 at 1:40 PM  wrote:
> >
> > Yep. The one image is encoded as CCITT (aka Group 4, aka fax [1]), which is
> > passable for low res B&W images, but not that much for hi-res or color (or
> > gray scale). It compresses much worse than the other which is JPEG, which is
> > expressly made for hi-res and color (or grayscale) images.
> >
> > OTOH, CCITT is lossless and JPEG lossy ;-)
> >
> ok, thanks.
> 
> > > Questions:
> > > 1) Does the large file size have anything to do with the printer
> > > itself? Is there anything I can do (ex:- update the driver/firmware or
> > > something)?
> >
> > That depends on what is encoding the images: does the scanner itself
> > "make" the PDF? Or some software, computer-side?
> >
> 
> The scanner itself makes the pdf files.

I'm intrigued; I hadn't realised that conversion of the scanned image
for some vendors' devices took place on the device itself. How do you
know this happens? It is the frontend to SANE (xsane or scanimage, for
example) which I've always associated with image aquisition conversion.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-02 Thread Jörg-Volker Peetz
With the pdf-files from my Canon scanner, I did shrink them with the help of
ghostscript:

$ ps2pdf  old.pdf  new.pdf

Documentation can be found in ghostscript-doc.

Regards,
Jörg.



Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Wed, Jan 02, 2019 at 11:04:07AM +, Chris Ramsden wrote:
> On 2019-01-02 10:24, to...@tuxteam.de wrote:
> > My scanner just transfers the raw image. The scan program is responsible
> > for the transformation to the target format, which I can choose. This is
> > how /I/ want to be treated, as a paying customer.
> >
> > Cheers
> > -- tomás
> 
> Would you mind sharing with us what make and model you use? I haven't
> found it easy to identify which scanners offer this feature.

Sorry, I don't have access to it at the moment. I'll try to look it
up when I'm back home.

Anyway, it's now over 12 years old -- I doubt it's still on the
market. At that time I looked at the SANE databases [1] to help
me make a decision, together with my (then) computer dealer (ah,
I miss him, he knew what matters: having a computer dealer you
trust is worth gold).

Cheers
[1] http://sane-project.org/
-- tomás


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-02 Thread deloptes
to...@tuxteam.de wrote:

> I get this, and offering that option seems to make sense. But forcing
> it (and forcing an image format like JPEG) doesn't make sense. So either
> provide the knobs or let the host software do it.
> 
> My scanner just transfers the raw image. The scan program is responsible
> for the transformation to the target format, which I can choose. This is
> how /I/ want to be treated, as a paying customer.

+1

I did some research before buying a scanner and I bought Epson Perfection
33-330. The iscan application let you choose the image format. I am not
quite sure what it uses if I say multiple pages in one PDF. I will try
this, but I assume it acts based on the configuration before scanning (i.e.
grayscale or black/white etc.)

regards



Re: [OT] scanned files are large in size

2019-01-02 Thread Chris Ramsden
On 2019-01-02 10:24, to...@tuxteam.de wrote:
> My scanner just transfers the raw image. The scan program is responsible
> for the transformation to the target format, which I can choose. This is
> how /I/ want to be treated, as a paying customer.
>
> Cheers
> -- tomás

Would you mind sharing with us what make and model you use? I haven't
found it easy to identify which scanners offer this feature.

-- 
Chris



Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Wed, Jan 02, 2019 at 09:40:33AM +, Joe wrote:
> On Wed, 2 Jan 2019 09:59:48 +0100
> to...@tuxteam.de wrote:
> 
> 
> > 
> > And next time, try to find a scanner which provides you with a raw
> > image. Wrapping images in PDFs is... not elegant.
> 
> They do this to cater for multiple pages, whereas in my experience,
> most scanning is single-sheet. Even the Simple Scan program on Debian
> defaults to pdf, something which cannot be configured.

I get this, and offering that option seems to make sense. But forcing
it (and forcing an image format like JPEG) doesn't make sense. So either
provide the knobs or let the host software do it.

My scanner just transfers the raw image. The scan program is responsible
for the transformation to the target format, which I can choose. This is
how /I/ want to be treated, as a paying customer.

Cheers
-- tomás


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-02 Thread Joe
On Wed, 2 Jan 2019 09:59:48 +0100
to...@tuxteam.de wrote:


> 
> And next time, try to find a scanner which provides you with a raw
> image. Wrapping images in PDFs is... not elegant.

They do this to cater for multiple pages, whereas in my experience,
most scanning is single-sheet. Even the Simple Scan program on Debian
defaults to pdf, something which cannot be configured.

-- 
Joe



Re: [OT] scanned files are large in size

2019-01-02 Thread tomas
On Tue, Jan 01, 2019 at 10:41:06PM -0500, kamaraju kusumanchi wrote:
> On Tue, Jan 1, 2019 at 1:40 PM  wrote:

[...]

> > OTOH, CCITT is lossless and JPEG lossy ;-)
> >
> ok, thanks.

But note that Anders observed that the larger files are actually
the JPEGs (somewhat to my surprise). A possible explanation would
be that the CCITT loses the grayscale information, as Thomas observed
(CCITT is bitonal), but then you should be able to see a difference
between both images.

In any case, if the scanners insist on producing PDFs, you'll have
to extract the images, convert them and, if necessary, repack them
again as PDFs. For an one-off job, the Gimp seems just about right;
if you want to automate it, I'd try with the ImageMagick suite, but
perhaps there are folks around who have more experience in those
things.

And next time, try to find a scanner which provides you with a raw
image. Wrapping images in PDFs is... not elegant.

Cheers
-- tomás


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-01 Thread kamaraju kusumanchi
On Tue, Jan 1, 2019 at 3:04 PM Brian  wrote:
>
> On Tue 01 Jan 2019 at 12:34:38 -0500, kamaraju kusumanchi wrote:
>
> > A scanned document from Canon pixma mx870 printer is significantly
> > larger compared to the same document scanned on a different scanner.
>
> Which is...?

Do not have this information at the moment. Will provide it tomorrow.

> > When I look at both the images side by side on a PC, there is no
> > visual difference between the two. I am trying to understand the
> > underlying cause and fix it if possible.
>
> You could mention which scanning software you used and what the
> setting for the output file format was.
>

Both images are obtained from the scanners directly. I did not use any
specific software per se. The only setting I had to choose was the dpi
- which in both cases is set to 600.

> > Questions:
> > 1) Does the large file size have anything to do with the printer
> > itself? Is there anything I can do (ex:- update the driver/firmware or
> > something)?
>
> Not at all; the printer has nothing to do with it. Printing is printing.
> Scanning is scanning.
>

Understood. This is an 'all in one' printer which has both printing
and scanning capabilities.

> > 2) Is the difference in image sizes due to the bpc (1 vs. 8) or
> > encoding (ccitt vs jped) fields?
>
> Could be.
>
> > 3) If yes, how to change them?
>
> One file is in (I think) tiff format. The other isn't. You didn't scan
> like and like from both devices.

There are not that many options to choose from the scan settings. You
just choose the dpi and that is about it.

I understand that we can't change much on what the scanner produces.
But there should be some software to further change the scanner's
output files?

-- 
Kamaraju S Kusumanchi | http://raju.shoutwiki.com/wiki/Blog



Re: [OT] scanned files are large in size

2019-01-01 Thread kamaraju kusumanchi
On Tue, Jan 1, 2019 at 1:40 PM  wrote:
>
> Yep. The one image is encoded as CCITT (aka Group 4, aka fax [1]), which is
> passable for low res B&W images, but not that much for hi-res or color (or
> gray scale). It compresses much worse than the other which is JPEG, which is
> expressly made for hi-res and color (or grayscale) images.
>
> OTOH, CCITT is lossless and JPEG lossy ;-)
>
ok, thanks.

> > Questions:
> > 1) Does the large file size have anything to do with the printer
> > itself? Is there anything I can do (ex:- update the driver/firmware or
> > something)?
>
> That depends on what is encoding the images: does the scanner itself
> "make" the PDF? Or some software, computer-side?
>

The scanner itself makes the pdf files.

> > 2) Is the difference in image sizes due to the bpc (1 vs. 8) or
> > encoding (ccitt vs jped) fields?
>
> CCITT vs JPEG, yes.
>

ok. What about bpc? Does that matter for file size?

> > 3) If yes, how to change them?
>
> Hmmm. I don't know yet whether you have to talk to your scanner
> or to your scan software...

I think there is not much I can do from the scanner's settings. But I
was wondering if I can use some software (like convert, gs etc.,) to
change the encoding, bpc etc., and reduce the file sizes without
sacrificing too much on the quality. To be honest, both files look the
same when viewed side by side.

-- 
Kamaraju S Kusumanchi | http://raju.shoutwiki.com/wiki/Blog



Re: [OT] scanned files are large in size

2019-01-01 Thread Brian
On Tue 01 Jan 2019 at 12:34:38 -0500, kamaraju kusumanchi wrote:

> A scanned document from Canon pixma mx870 printer is significantly
> larger compared to the same document scanned on a different scanner.

Which is...?

> When I look at both the images side by side on a PC, there is no
> visual difference between the two. I am trying to understand the
> underlying cause and fix it if possible.

You could mention which scanning software you used and what the
setting for the output file format was.

> As shown below, scanned_in_office.pdf is 332Kb, scanned_on_mx870.pdf is 1.7 
> Mb.
> 
> % ls -al scanned_in_office.pdf scanned_on_mx870.pdf
> -rw-r--r-- 1 rajulocal rajulocal  331796 Jan  1 11:54 scanned_in_office.pdf
> -rw-r--r-- 1 rajulocal rajulocal 1775460 Jan  1 11:48 scanned_on_mx870.pdf
> 
> Both are are scanned at 600 dpi. The only difference I see is in bpc,
> enc fields.
> 
> % pdfimages -list scanned_in_office.pdf
> page   num  type   width height color comp bpc  enc interp  object ID
> x-ppi y-ppi size ratio
> 
>   1 0 image5104  6600  gray1   1  ccitt  no 7  0
> 601   600  183K 4.5%
>   2 1 image5104  6600  gray1   1  ccitt  no14  0
> 601   600  138K 3.4%
> 
> % pdfimages -list scanned_on_mx870.pdf
> page   num  type   width height color comp bpc  enc interp  object ID
> x-ppi y-ppi size ratio
> 
>   1 0 image5100  6600  gray1   8  jpeg   no 8  0
> 600   600 1066K 3.2%
>   2 1 image5100  6600  gray1   8  jpeg   no14  0
> 600   600  665K 2.0%
> 
> Questions:
> 1) Does the large file size have anything to do with the printer
> itself? Is there anything I can do (ex:- update the driver/firmware or
> something)?

Not at all; the printer has nothing to do with it. Printing is printing.
Scanning is scanning.

> 2) Is the difference in image sizes due to the bpc (1 vs. 8) or
> encoding (ccitt vs jped) fields?

Could be.

> 3) If yes, how to change them?

One file is in (I think) tiff format. The other isn't. You didn't scan
like and like from both devices.

-- 
Brian.



Re: [OT] scanned files are large in size

2019-01-01 Thread tomas
On Tue, Jan 01, 2019 at 07:55:55PM +0100, Anders Andersson wrote:
> On Tue, Jan 1, 2019 at 7:40 PM  wrote:
> 
> > On Tue, Jan 01, 2019 at 12:34:38PM -0500, kamaraju kusumanchi wrote:

[...]

> > OTOH, CCITT is lossless and JPEG lossy ;-)
> >
> 
> Not sure what you mean by "compresses much worse" here, but the CCITT
> version is much smaller than the JPEG version. Maybe you meant that CCITT
> looks worse after compression, which is weird when you also write that
> CCITT is lossless!

You're right -- but Thomas has the solution for this riddle (the CCITT
throws away the grayscale information -- so, lossy too, after all :-)

Cheers
-- t


signature.asc
Description: Digital signature


Re: [OT] scanned files are large in size

2019-01-01 Thread Thomas Schmitt
Hi,

tomas wrote:
> > Yep. The one image is encoded as CCITT
> > It compresses much worse than the other which is JPEG,

Anders Andersson wrote:
> Not sure what you mean by "compresses much worse" here, but the CCITT
> version is much smaller than the JPEG version.

Because it uses 1 bit per channel/color whereas the other file uses
8 bpc. The fact that their size ratio is less than 1:8 demonstrates
the lossy compression power of the algorithm used in the 8 bpc image.

kamaraju kusumanchi, the OP, might see differences between both files if
using a viewer that can zoom-in far enough that scan pixels of 1/600 inch
become larger than the screen pixels.


Have a nice day :)

Thomas



Re: [OT] scanned files are large in size

2019-01-01 Thread Anders Andersson
On Tue, Jan 1, 2019 at 7:40 PM  wrote:

> On Tue, Jan 01, 2019 at 12:34:38PM -0500, kamaraju kusumanchi wrote:
> > A scanned document from Canon pixma mx870 printer is significantly
> > larger compared to the same document scanned on a different scanner.
> > When I look at both the images side by side on a PC, there is no
> > visual difference between the two. I am trying to understand the
> > underlying cause and fix it if possible.
> >
> > As shown below, scanned_in_office.pdf is 332Kb, scanned_on_mx870.pdf is
> 1.7 Mb.
> >
> > % ls -al scanned_in_office.pdf scanned_on_mx870.pdf
> > -rw-r--r-- 1 rajulocal rajulocal  331796 Jan  1 11:54
> scanned_in_office.pdf
> > -rw-r--r-- 1 rajulocal rajulocal 1775460 Jan  1 11:48
> scanned_on_mx870.pdf
>
> Yep. The one image is encoded as CCITT (aka Group 4, aka fax [1]), which is
> passable for low res B&W images, but not that much for hi-res or color (or
> gray scale). It compresses much worse than the other which is JPEG, which
> is
> expressly made for hi-res and color (or grayscale) images.
>
> OTOH, CCITT is lossless and JPEG lossy ;-)
>

Not sure what you mean by "compresses much worse" here, but the CCITT
version is much smaller than the JPEG version. Maybe you meant that CCITT
looks worse after compression, which is weird when you also write that
CCITT is lossless!


Re: [OT] scanned files are large in size

2019-01-01 Thread tomas
On Tue, Jan 01, 2019 at 12:34:38PM -0500, kamaraju kusumanchi wrote:
> A scanned document from Canon pixma mx870 printer is significantly
> larger compared to the same document scanned on a different scanner.
> When I look at both the images side by side on a PC, there is no
> visual difference between the two. I am trying to understand the
> underlying cause and fix it if possible.
> 
> As shown below, scanned_in_office.pdf is 332Kb, scanned_on_mx870.pdf is 1.7 
> Mb.
> 
> % ls -al scanned_in_office.pdf scanned_on_mx870.pdf
> -rw-r--r-- 1 rajulocal rajulocal  331796 Jan  1 11:54 scanned_in_office.pdf
> -rw-r--r-- 1 rajulocal rajulocal 1775460 Jan  1 11:48 scanned_on_mx870.pdf
> 
> Both are are scanned at 600 dpi. The only difference I see is in bpc,
> enc fields.

Yep. The one image is encoded as CCITT (aka Group 4, aka fax [1]), which is
passable for low res B&W images, but not that much for hi-res or color (or
gray scale). It compresses much worse than the other which is JPEG, which is
expressly made for hi-res and color (or grayscale) images.

OTOH, CCITT is lossless and JPEG lossy ;-)

> Questions:
> 1) Does the large file size have anything to do with the printer
> itself? Is there anything I can do (ex:- update the driver/firmware or
> something)?

That depends on what is encoding the images: does the scanner itself
"make" the PDF? Or some software, computer-side?

> 2) Is the difference in image sizes due to the bpc (1 vs. 8) or
> encoding (ccitt vs jped) fields?

CCITT vs JPEG, yes.

> 3) If yes, how to change them?

Hmmm. I don't know yet whether you have to talk to your scanner
or to your scan software...

Cheers

[1] https://en.wikipedia.org/wiki/Group_4_compression
-- tomás


signature.asc
Description: Digital signature


[OT] scanned files are large in size

2019-01-01 Thread kamaraju kusumanchi
A scanned document from Canon pixma mx870 printer is significantly
larger compared to the same document scanned on a different scanner.
When I look at both the images side by side on a PC, there is no
visual difference between the two. I am trying to understand the
underlying cause and fix it if possible.

As shown below, scanned_in_office.pdf is 332Kb, scanned_on_mx870.pdf is 1.7 Mb.

% ls -al scanned_in_office.pdf scanned_on_mx870.pdf
-rw-r--r-- 1 rajulocal rajulocal  331796 Jan  1 11:54 scanned_in_office.pdf
-rw-r--r-- 1 rajulocal rajulocal 1775460 Jan  1 11:48 scanned_on_mx870.pdf

Both are are scanned at 600 dpi. The only difference I see is in bpc,
enc fields.

% pdfimages -list scanned_in_office.pdf
page   num  type   width height color comp bpc  enc interp  object ID
x-ppi y-ppi size ratio

  1 0 image5104  6600  gray1   1  ccitt  no 7  0
601   600  183K 4.5%
  2 1 image5104  6600  gray1   1  ccitt  no14  0
601   600  138K 3.4%

% pdfimages -list scanned_on_mx870.pdf
page   num  type   width height color comp bpc  enc interp  object ID
x-ppi y-ppi size ratio

  1 0 image5100  6600  gray1   8  jpeg   no 8  0
600   600 1066K 3.2%
  2 1 image5100  6600  gray1   8  jpeg   no14  0
600   600  665K 2.0%

Questions:
1) Does the large file size have anything to do with the printer
itself? Is there anything I can do (ex:- update the driver/firmware or
something)?
2) Is the difference in image sizes due to the bpc (1 vs. 8) or
encoding (ccitt vs jped) fields?
3) If yes, how to change them?

thanks
raju

-- 
Kamaraju S Kusumanchi | http://raju.shoutwiki.com/wiki/Blog