Re: Python question - first call is slower?

2012-06-20 Thread Nadav Har'El
On Wed, Jun 20, 2012, Meir Kriheli wrote about Re: Python question - first 
call is slower?:
  I considered, and discredited, the following attempted explanations:
...
its code gets done in 6 milliseconds; It's not a 12 millisecond pause
and then the rest of the function finishes in 2ms.
 
 
 Yeah, compiles to bytecode.

As far as I understand, the code is compiled to bytecode when it is
imported, *not* when it is first run, so it doesn't explain why the
first run of a function is slower. If you think it's otherwise, please
let me know.

 Nitpick: CPython doesn't (the one you're referring to as Python), other
 implementations may and will (PyPy, Jython, etc)

Yes, I'm talking about the python executable, which I guess is
CPython.

  3. If class A imports B which imports C which imports D, some of these
classes are only read when the code is actually used for the first
time. Again, I couldn't find any evidence that this is true in Python
(unlike, e.g., Java). An import would read the whole class hierarchy
  into
memory. Right?
 A module is loaded only once (see also: sys.modules)

Yes, this I know. But at first I thought maybe it is loaded lazily
somehow, so some things are loaded only when a function from the module
is actually run. But I can find no evidence of that - every bit of
documentation I can find suggests that the module, and every module it
refers to, recursively, are fully loaded at the time of the import of
the top module.

 Can you post some code ? Without it this may be tough.

Yes, sorry. The specific code I saw this problem with is free software, so I
could theoretically post it, but it takes quite a bit of setup to get
started with so I don't think it would be very useful for others. I'll try
to see if I can recreate this problem with a much simpler piece of code
and report back.

 Do you have imports inside the function ?

I looked for these, and didn't find any.

 Do you access and affect globals ?

Why does that matter? Anyway, like I said the function doesn't keep any
state, so it's not like the second run notices the first run has already
done the job, and does nothing - or anything like that. Unless, of
course, I'm misunderstanding something in the code ;-)

 Is it a generator ?

What does this mean? Sorry, but my level of knowledge of Python is well
below my level in other programming languages... I guess I felt that I
already know one language too many :(

 Do you have default parameters ? Are they mutable ? Do they require some
 computation ?

This is a complicated setup, with one function calling another calling
..., so the whole thing probably uses every imaginable Python construct,
but probably not in a heavy way which can explain a 12ms (6-fold)
slowdown of the top function. I'll investigate.

 Is there a difference in the dataset between the runs (python caches small
 ints).

No, it's the same computation again. I'd be surprised if the number of
integers in this code explains a 12ms (6-fold) slowdown in that
function, but I'll investigate.

 Does it manipulate files (will be cached by the OS),

Yes, but I run it over and over with the same files, so they will
already be cached in all the runs (including the first, which is the
first only in this python process).

Thanks,
Nadav.

-- 
Nadav Har'El| Wednesday, Jun 20 2012, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |If marriage was illegal, only outlaws
http://nadav.harel.org.il   |would have in-laws.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Python question - first call is slower?

2012-06-20 Thread Oleg Goldshmidt
On Wed, Jun 20, 2012 at 12:27 AM, Nadav Har'El n...@math.technion.ac.ilwrote:

 Hi, I have run across a puzzling issue in Python, and I wonder if anyone
 on the list can explain it.

 I have a python function which takes some input and produces some
 output - it doesn't keep permanent state, and presumably running it
 twice would do exactly the same thing twice, and take exactly the same
 time.

 But strangely, it doesn't - the first call takes 14 milliseconds, while
 the second and all subsequent calls take only 2 milliseconds each.
 Does anybody have any idea why this can happen?


Guessing wildly (I don't know much about Python's inner workings, thinking
system here): maybe some libraries get dynamically loaded into memory the
first time and reused later?


 I considered, and discredited, the following attempted explanations:

 ...


 3. If class A imports B which imports C which imports D, some of these
   classes are only read when the code is actually used for the first
   time. Again, I couldn't find any evidence that this is true in Python
   (unlike, e.g., Java). An import would read the whole class hierarchy
 into
   memory. Right?


I would - naively! - think that stuff (not only, or necessarily, the
python code stuff, but also what it needs from the system) would indeed be
loaded dynamically when it is needed. I don't base this on anything I know,
but on the intuitive notion that whatever Python does works as normal C
libraries - it's implemented in C, right?

Another (kinda related) possibility - some cacheing going on?

What happens if you run the program twice - is the first call slow every
time or just the first time?

In general, t does not surprise me that something - whatever - runs slower
the first time than subsequent times - it happens often.

-- 
Oleg Goldshmidt | p...@goldshmidt.org o...@goldshmidt.org
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Digikam image re-compression - is it reliable?

2012-06-20 Thread Nadav Har'El
On Wed, Jun 20, 2012, Amos Shapira wrote about Re: Digikam image 
re-compression - is it reliable?:
 They are in JPG, not RAW. exif is copied over.
 Minimal compression setting (whatever that means on the camera's user
 interface).

It is possible that the minimal compression option exists not because
it is recommended, but because the marketing people demanded it, and
you're actually supposed to use the better compressed options named
something like fine or normal or something.

Just as an example of what you might be wasting, I took a 12 megapixel
(3000x4000) family photo, and saw the following sizes:

8.9 MB - lossless compression (PNG)
4.2 MB - JPEG at 100% setting
3.5 MB - JPEG at 99% setting
2.4 MB - JPEG at 95% setting
1.8 MB - JPEG at 90% setting
1.4 MB - JPEG at 85% setting
1.1 MB - JPEG at 80% setting
0.8 MB - JPEG at 75% setting

So as you can see, you can indeed significantly reduce your file size by
not insisting on minimal compression (if that means lossless
compression, or JPEG at 100% or 99% setting) you can achieve a much
better compression. I'd go with 95% or even 90% and don't think you'll
ever notice a difference (though I don't presume to be an expert on the
subject). I wouldn't go down to 75% unless you're really short on space-
remember that in 10 years, you'd be laughing at these sizes which you
once thought were large ;-)

Nadav.

-- 
Nadav Har'El| Wednesday, Jun 20 2012, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |A fine is a tax for doing wrong. A tax is
http://nadav.harel.org.il   |a fine for doing well.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Python question - first call is slower?

2012-06-20 Thread Nadav Har'El
On Wed, Jun 20, 2012, Oleg Goldshmidt wrote about Re: Python question - first 
call is slower?:
 I would - naively! - think that stuff (not only, or necessarily, the
 python code stuff, but also what it needs from the system) would indeed be
 loaded dynamically when it is needed. I don't base this on anything I know,
 but on the intuitive notion that whatever Python does works as normal C
 libraries - it's implemented in C, right?

This is a good angle to investigate, although since I noticed the
function becomes uniformly slower (after 1/3rd of the code, it spends
1/3rd of 14 milliseconds), it would be strange that I am loading so many
shared libraries for this to be so uniform, but I guess anything is
possible.

 Another (kinda related) possibility - some cacheing going on?
 What happens if you run the program twice - is the first call slow every
 time or just the first time?

No, every time I run the program, the first call is slow and the
subsequent calls are fast. So it's not disk reads or anything which is
cached by the system between runs.

 In general, t does not surprise me that something - whatever - runs slower
 the first time than subsequent times - it happens often.

Yes, its sad that computer software has become so complex that often you
don't understand what your program *really* does, and you start to
accept annoying phenomenon as laws of nature :( In a few years, when
software is as complex as human brains, I guess it would be normal to
explain that some software is slower today because it is depressed ;-)

-- 
Nadav Har'El| Wednesday, Jun 20 2012, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Spelling mistakes left in for people who
http://nadav.harel.org.il   |feel the need to correct others.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Digikam image re-compression - is it reliable?

2012-06-20 Thread Amos Shapira
(Sorry was meant to reply to the list as well)

Thanks Nadav. I'll try the %90 and %95 option.
Trouble is that I need to hand the disk-on-key over to the courier in a
couple of hours so might just stick to un-recompressed if it still fits.


 On 20 June 2012 16:37, Nadav Har'El n...@math.technion.ac.il wrote:

 On Wed, Jun 20, 2012, Amos Shapira wrote about Re: Digikam image
 re-compression - is it reliable?:
  They are in JPG, not RAW. exif is copied over.
  Minimal compression setting (whatever that means on the camera's user
  interface).

 It is possible that the minimal compression option exists not because
 it is recommended, but because the marketing people demanded it, and
 you're actually supposed to use the better compressed options named
 something like fine or normal or something.

 Just as an example of what you might be wasting, I took a 12 megapixel
 (3000x4000) family photo, and saw the following sizes:

8.9 MB - lossless compression (PNG)
4.2 MB - JPEG at 100% setting
3.5 MB - JPEG at 99% setting
2.4 MB - JPEG at 95% setting
1.8 MB - JPEG at 90% setting
1.4 MB - JPEG at 85% setting
1.1 MB - JPEG at 80% setting
0.8 MB - JPEG at 75% setting

 So as you can see, you can indeed significantly reduce your file size by
 not insisting on minimal compression (if that means lossless
 compression, or JPEG at 100% or 99% setting) you can achieve a much
 better compression. I'd go with 95% or even 90% and don't think you'll
 ever notice a difference (though I don't presume to be an expert on the
 subject). I wouldn't go down to 75% unless you're really short on space-
 remember that in 10 years, you'd be laughing at these sizes which you
 once thought were large ;-)

 Nadav.

 --
 Nadav Har'El| Wednesday, Jun 20
 2012,
 n...@math.technion.ac.il
 |-
 Phone +972-523-790466, ICQ 13349191 |A fine is a tax for doing wrong. A
 tax is
 http://nadav.harel.org.il   |a fine for doing well.




 --
  [image: View my profile on LinkedIn]
 http://www.linkedin.com/in/gliderflyer




-- 
 [image: View my profile on LinkedIn]
http://www.linkedin.com/in/gliderflyer
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Python question - first call is slower?

2012-06-20 Thread Amos Shapira
On 20 June 2012 16:46, Nadav Har'El n...@math.technion.ac.il wrote:

 On Wed, Jun 20, 2012, Oleg Goldshmidt wrote about Re: Python question -
 first call is slower?:
  I would - naively! - think that stuff (not only, or necessarily, the
  python code stuff, but also what it needs from the system) would indeed
 be
  loaded dynamically when it is needed. I don't base this on anything I
 know,
  but on the intuitive notion that whatever Python does works as normal C
  libraries - it's implemented in C, right?

 This is a good angle to investigate, although since I noticed the
 function becomes uniformly slower (after 1/3rd of the code, it spends
 1/3rd of 14 milliseconds), it would be strange that I am loading so many
 shared libraries for this to be so uniform, but I guess anything is
 possible.


How about  strace -ttt and ltrace -ttt, could they reveal where your
missing time is spent?

 In general, t does not surprise me that something - whatever - runs slower
 Yes, its sad that computer software has become so complex that often you
 don't understand what your program *really* does, and you start to
 accept annoying phenomenon as laws of nature :( In a few years, when
 software is as complex as human brains, I guess it would be normal to
 explain that some software is slower today because it is depressed ;-)


Add to that code writers (I wouldn't dignify what they do by labeling it as
programming) who don't have the slightest idea of how even simple
programs really work and you end up with a vicious circle of wasted CPU
cycles.

--Amos

 [image: View my profile on LinkedIn]
http://www.linkedin.com/in/gliderflyer
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


install console-only system]

2012-06-20 Thread Avraham Rosenberg
Hi,

   I downloaded the debian-wheezy-DI-a1-i386-netinst.iso image and tried to
install debian wheezy, as a console-only (no X) system.
   I chosed locale C, manual partitioning and ext3 file system. At the
tasksel stage, I chosed the option basic system only.
   Everything went smoothly, except for grub not identifying my old squeeze
system (I had been forewarned!). But on the boot of new system my graphic
card (lspci:VGA compatible controller: ATI Technologies Inc RV610 [Radeon HD 
2400 XT])
refused to speak with the monitor (Flatron w2271TC) and I was stuck. The
only way out I could think, was to use the install disk of the squeeze
system in rescue mode and update grub.
   But, of course, I am not able to use the wheezy partition. Any
suggestion ?
   The machine is a Sun Ultra 20 station.
   
   Thanks, Avraham

-- 
Please avoid sending Excel or Powerpoint attachments to this address.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


How can I explore what is causing my laptop to not come out of suspend properly when the lid is opened?

2012-06-20 Thread Michael Shiloh
For years I had been hibernating my laptop (Lenova T60 and now T61) 
instead of shutting down, and of course opening the lid did nothing 
until I pressed the power button. Besides the long amount of time it 
would take to come out of hibernation, this SEEMED to work fine, 
although sometimes I was presented with a login screen instead of an 
unlock screen, suggesting that I was booting up fresh rather than simply 
coming out of hibernation.


Recently I've learned that suspend is quite reliable and of course much 
faster. I suspend either automatically on lid closure, or manually, and 
come out of suspend automatically when I open the lid.


Occasionally, coming out of suspend fails. The power indicator light is 
on, as well as bluetooth and wifi. Wifi is even blinking occasionally, 
but I don't know what this means. The disc activity light is off.


The only keys that are recognized are the NmLk and the little lamp 
that lights up the keyboard. I suspect these are handled by a 
microcontroller running the keyboard and not the main processor.


Every other key and key combination I can think of is ignored, e.g. 
Ctrl Alt F1 etc. to get a console login.


I have tried closing and reopening the lid, applying and removing 
external power, and pressing every single key, along with every 
combination of Shift Ctrl and Alt, as well as the blue Fn 
button. Other than the numlock and keyboard lamp, nothing has any effect.


I have tried both hibernating automatically on lid closure, and 
hibernating manually prior to lid closure. The problem seems worse when 
I hibernate automatically, but this is not a terribly scientific conclusion.


I realize now that I may have been seeing the same problem when coming 
out of hibernation.


I recognize that the problem may not be been caused by a problem 
starting up, but rather, due to some error while hibernating or suspending.


What can I do to debug this? Any suggestions, comments, and ideas would 
be appreciated


Michael

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: How can I explore what is causing my laptop to not come out of suspend properly when the lid is opened?

2012-06-20 Thread Michael Shiloh
I forgot to mention I'm running Xubuntu 12.04 which I keep thoroughly up 
to date.


Michael Shiloh
Artist, designer, tinkerer, teacher, geek
KA6RCQ
www.teachmetomake.com
www.teachmetomake.com/wordpress
teachmetomake.wordpress.com
groups.google.com/group/teach-me-to-make
michaelshiloh.pbworks.com

On 06/20/2012 11:17 AM, Michael Shiloh wrote:

For years I had been hibernating my laptop (Lenova T60 and now T61)
instead of shutting down, and of course opening the lid did nothing
until I pressed the power button. Besides the long amount of time it
would take to come out of hibernation, this SEEMED to work fine,
although sometimes I was presented with a login screen instead of an
unlock screen, suggesting that I was booting up fresh rather than simply
coming out of hibernation.

Recently I've learned that suspend is quite reliable and of course much
faster. I suspend either automatically on lid closure, or manually, and
come out of suspend automatically when I open the lid.

Occasionally, coming out of suspend fails. The power indicator light is
on, as well as bluetooth and wifi. Wifi is even blinking occasionally,
but I don't know what this means. The disc activity light is off.

The only keys that are recognized are the NmLk and the little lamp
that lights up the keyboard. I suspect these are handled by a
microcontroller running the keyboard and not the main processor.

Every other key and key combination I can think of is ignored, e.g.
Ctrl Alt F1 etc. to get a console login.

I have tried closing and reopening the lid, applying and removing
external power, and pressing every single key, along with every
combination of Shift Ctrl and Alt, as well as the blue Fn
button. Other than the numlock and keyboard lamp, nothing has any effect.

I have tried both hibernating automatically on lid closure, and
hibernating manually prior to lid closure. The problem seems worse when
I hibernate automatically, but this is not a terribly scientific
conclusion.

I realize now that I may have been seeing the same problem when coming
out of hibernation.

I recognize that the problem may not be been caused by a problem
starting up, but rather, due to some error while hibernating or suspending.

What can I do to debug this? Any suggestions, comments, and ideas would
be appreciated

Michael


___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Digikam image re-compression - is it reliable?

2012-06-20 Thread Shachar Shemesh
On 06/20/2012 06:13 AM, Amos Shapira wrote:
 Hi,

 I'm preparing a disk-on-key with family photos to send to my mum and
 noticed something a bit unexpected.
 Most of the photos were taken with a Canon EOS 300D, maximum
 resolution and minimum compression.
 Some were taken with Android phone and iPhone 4.
 I use Digikam on Debian to manage my photos.
 The total space of the original images (including movies, which
 weren't touched) was ~7.6Gb.
 The total space after re-compression using default parameters (75%,
 JPEG, no resizing) -  1Gb.

Here's a brief, and probably completely incorrect on several counts,
explanation of what JPEG compression does (and, for that matter, also
the single frame compression element of MPEG, MPEG2, MPEG4, H264 and
just about any lossy picture compression).

The picture is divided into squares. Each square is processed with an
algorithm called Discrete Cosine Transform (or DCT). If you know Fourier
transform, this is essentially the same thing, only in 2D. The resulting
is a square of the same size, but with each component in it representing
some frequency, rather than a single pixel.

And here's the thing. Some positions in this square are more important
than others. The practical upshot is that getting the value for some of
the positions in this square will result in errors in the picture that
are more visible to the human eye than others. Coincidentally, some
positions in this square also tend to have lower values (formally, the
waves these positions represent have a lower energy in the actual
picture). The encoding allows the final image format to not contain the
full square, but leave out a certain part of it.

So, for lossless JPEG, all you do is take those components that have
energy, and use those. This still provides a considerable saving on the
uncompressed size. You didn't say how much each picture took, but an
uncompressed 24bits/pixel 1920x1280 image will take a little over 7MB.
Lossless compression should save about half of that. Lossless JPEG can,
depending on the actual picture, be about 3MB. Allowing even a small
amount of lossiness (say, JPEG 95%) should bring you down to about 2MB,
depending on the actual picture. As usual, the law of diminishing
returns is in effect. You pay little visual artifacts for the initial
reduction of size, and much more later.

I hope this enhances your understanding, and therefor your ability to
rely on the compression.

Shachar


-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: How can I explore what is causing my laptop to not come out of suspend properly when the lid is opened?

2012-06-20 Thread ronys
You might want to check the Lenovo support website to see if they've a
BIOS/driver update for your model that addresses this issue.

Rony

On Wed, Jun 20, 2012 at 9:17 PM, Michael Shiloh michaelshiloh1...@gmail.com
 wrote:

 For years I had been hibernating my laptop (Lenova T60 and now T61)
 instead of shutting down, and of course opening the lid did nothing until I
 pressed the power button. Besides the long amount of time it would take to
 come out of hibernation, this SEEMED to work fine, although sometimes I was
 presented with a login screen instead of an unlock screen, suggesting that
 I was booting up fresh rather than simply coming out of hibernation.

 Recently I've learned that suspend is quite reliable and of course much
 faster. I suspend either automatically on lid closure, or manually, and
 come out of suspend automatically when I open the lid.

 Occasionally, coming out of suspend fails. The power indicator light is
 on, as well as bluetooth and wifi. Wifi is even blinking occasionally, but
 I don't know what this means. The disc activity light is off.

 The only keys that are recognized are the NmLk and the little lamp that
 lights up the keyboard. I suspect these are handled by a microcontroller
 running the keyboard and not the main processor.

 Every other key and key combination I can think of is ignored, e.g. Ctrl
 Alt F1 etc. to get a console login.

 I have tried closing and reopening the lid, applying and removing external
 power, and pressing every single key, along with every combination of
 Shift Ctrl and Alt, as well as the blue Fn button. Other than the
 numlock and keyboard lamp, nothing has any effect.

 I have tried both hibernating automatically on lid closure, and
 hibernating manually prior to lid closure. The problem seems worse when I
 hibernate automatically, but this is not a terribly scientific conclusion.

 I realize now that I may have been seeing the same problem when coming out
 of hibernation.

 I recognize that the problem may not be been caused by a problem starting
 up, but rather, due to some error while hibernating or suspending.

 What can I do to debug this? Any suggestions, comments, and ideas would be
 appreciated

 Michael

 __**_
 Linux-il mailing list
 Linux-il@cs.huji.ac.il
 http://mailman.cs.huji.ac.il/**mailman/listinfo/linux-ilhttp://mailman.cs.huji.ac.il/mailman/listinfo/linux-il




-- 
Ubi dubium, ibi libertas (where there is doubt, there is freedom)
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Python question - first call is slower?

2012-06-20 Thread Daniel Shahaf
Nadav Har'El wrote on Wed, Jun 20, 2012 at 09:20:04 +0300:
 On Wed, Jun 20, 2012, Meir Kriheli wrote about Re: Python question - first 
 call is slower?:
  Is it a generator ?
 
 What does this mean? Sorry, but my level of knowledge of Python is well
 below my level in other programming languages... I guess I felt that I
 already know one language too many :(
 

Does it invoke the 'yield' statement?

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Python question - first call is slower?

2012-06-20 Thread Dan Kenigsberg
On Wed, Jun 20, 2012 at 09:20:04AM +0300, Nadav Har'El wrote:
 On Wed, Jun 20, 2012, Meir Kriheli wrote about Re: Python question - first 
 call is slower?:
   I considered, and discredited, the following attempted explanations:
 ...
 its code gets done in 6 milliseconds; It's not a 12 millisecond pause
 and then the rest of the function finishes in 2ms.
  
  
  Yeah, compiles to bytecode.
 
 As far as I understand, the code is compiled to bytecode when it is
 imported, *not* when it is first run, so it doesn't explain why the
 first run of a function is slower. If you think it's otherwise, please
 let me know.
 
  Nitpick: CPython doesn't (the one you're referring to as Python), other
  implementations may and will (PyPy, Jython, etc)
 
 Yes, I'm talking about the python executable, which I guess is
 CPython.
 
   3. If class A imports B which imports C which imports D, some of these
 classes are only read when the code is actually used for the first
 time. Again, I couldn't find any evidence that this is true in Python
 (unlike, e.g., Java). An import would read the whole class hierarchy
   into
 memory. Right?
  A module is loaded only once (see also: sys.modules)
 
 Yes, this I know. But at first I thought maybe it is loaded lazily
 somehow, so some things are loaded only when a function from the module
 is actually run. But I can find no evidence of that - every bit of
 documentation I can find suggests that the module, and every module it
 refers to, recursively, are fully loaded at the time of the import of
 the top module.
 

I think Meir's point is that if your code eventually calls an stdlib function
like

def f():
import foo
return foo.bar()

the first call to the function would require a tedious load of foo.py,
which would be avoided on the second call.

I guess that `strace -eopen python` would reveal plenty of open(2)s during the
first computation.

Dan.

___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Digikam image re-compression - is it reliable?

2012-06-20 Thread Amos Shapira
Thanks.

On 21 June 2012 04:53, Shachar Shemesh shac...@shemesh.biz wrote:

  On 06/20/2012 06:13 AM, Amos Shapira wrote:

 Hi,

 I'm preparing a disk-on-key with family photos to send to my mum and
 noticed something a bit unexpected.
 Most of the photos were taken with a Canon EOS 300D, maximum resolution
 and minimum compression.
 Some were taken with Android phone and iPhone 4.
 I use Digikam on Debian to manage my photos.
 The total space of the original images (including movies, which weren't
 touched) was ~7.6Gb.
 The total space after re-compression using default parameters (75%, JPEG,
 no resizing) -  1Gb.

  Here's a brief, and probably completely incorrect on several counts,
 explanation of what JPEG compression does (and, for that matter, also the
 single frame compression element of MPEG, MPEG2, MPEG4, H264 and just about
 any lossy picture compression).

 The picture is divided into squares. Each square is processed with an
 algorithm called Discrete Cosine Transform (or DCT). If you know Fourier
 transform, this is essentially the same thing, only in 2D. The resulting is
 a square of the same size, but with each component in it representing some
 frequency, rather than a single pixel.

 And here's the thing. Some positions in this square are more important
 than others. The practical upshot is that getting the value for some of the
 positions in this square will result in errors in the picture that are more
 visible to the human eye than others. Coincidentally, some positions in
 this square also tend to have lower values (formally, the waves these
 positions represent have a lower energy in the actual picture). The
 encoding allows the final image format to not contain the full square, but
 leave out a certain part of it.

 So, for lossless JPEG, all you do is take those components that have
 energy, and use those. This still provides a considerable saving on the
 uncompressed size. You didn't say how much each picture took, but an
 uncompressed 24bits/pixel 1920x1280 image will take a little over 7MB.
 Lossless compression should save about half of that. Lossless JPEG can,
 depending on the actual picture, be about 3MB. Allowing even a small amount
 of lossiness (say, JPEG 95%) should bring you down to about 2MB, depending
 on the actual picture. As usual, the law of diminishing returns is in
 effect. You pay little visual artifacts for the initial reduction of size,
 and much more later.

 I hope this enhances your understanding, and therefor your ability to rely
 on the compression.

 Shachar



 --
 Shachar Shemesh
 Lingnu Open Source Consulting Ltd.http://www.lingnu.com




-- 
 [image: View my profile on LinkedIn]
http://www.linkedin.com/in/gliderflyer
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il


Re: Digikam image re-compression - is it reliable?

2012-06-20 Thread Udi Finkelstein
On Wed, Jun 20, 2012 at 9:53 PM, Shachar Shemesh shac...@shemesh.bizwrote:

   So, for lossless JPEG, all you do is take those components that have
 energy, and use those. This still provides a considerable saving on the
 uncompressed size. You didn't say how much each picture took, but an
 uncompressed 24bits/pixel 1920x1280 image will take a little over 7MB.
 Lossless compression should save about half of that. Lossless JPEG can,
 depending on the actual picture, be about 3MB. Allowing even a small amount
 of lossiness (say, JPEG 95%) should bring you down to about 2MB, depending
 on the actual picture. As usual, the law of diminishing returns is in
 effect. You pay little visual artifacts for the initial reduction of size,
 and much more later.


As far as I know, there is no such thing as lossless JPEG.

Due to the DCT (Discrete Cosine Transform) you mentioned above, you cannot
take a square of 8x8 pixels and have an accurate DCT calculation. because
you always lose precision, either by going to floating point, or by using
finite integer numbers. Perhaps you can get into lossless compression if
you use so many bits that will make the whole thing pointless because a PNG
image would be smaller.

Therefore, using JPEG for lossless images is futile. If you want lossless,
go the PNG way. If you are willing to pay some image loss (and control how
much), JPEG, or other more advanced formats such as JPEG2000 (wavelet based
compression), is more suitable.


 I hope this enhances your understanding, and therefor your ability to rely
 on the compression.

 Shachar


 Udi
___
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il