Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-24 Thread Winter, G (Graeme)
Hi James, 

On the gathering of the data from all possible beamline / source /
detector combinations below, I am also keen to get hold of these. To
assist with this I have written a couple of bash shell scripts which
will tar, gzip and split into 128MB chunks data, then reverse this
process to ensure that the images are correctly preserved. I am sure
that people will rise to the challenge of improving them, but they work
for me...

Scripts follow:

packer.sh

This will tar up the source directory to files prefixed with the
destination prefix, and compute md5sums for the resulting files. Usage:

Usage: packer.sh ./path/to/image/dir prefix - prefix.aaa prefix.aab etc
files.

--- packer.sh ---
#!/bin/bash

# check input arguments here

if [ $# -ne 2 ] ; then
echo $0 source destination
exit
fi

export source=${1}
export dest=${2}

echo Packing ${source} to ${dest}

tar cvf - ${source} | bzip2 | split -a 3 -b 128m - ${dest}.

md5sum ${dest}*  ${dest}.md5


unpacker

Performs the inverse operation above - checks the md5 sums and unpacks
the original directory structure...

Usage: unpacker prefix ./path/to/destination/dir


 unpacker.sh
-
#!/bin/bash

# check input arguments

if [ $# -ne 2 ] ; then
echo $0 source destination
exit
fi

export source=${1}
export dest=${2}

# explain what we are doing

echo Unpacking ${source} to ${dest}

# check the md5sums of the chunks
md5sum -c ${source}.md5

if [ $? -ne 0 ] ; then
exit
fi

# then go ahead and unpack
cat `ls ${source}.* | grep -v md5 | sort` | bunzip2 | tar xvf - -C
${dest}
---

Tested on my OS X intel mac but I think they should work fine on Linux
and perhaps any other modernish UNIX installation, with the GNU
coreutils available.

Cheers,

Graeme 

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
James Holton
Sent: 23 August 2007 18:47
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] diffraction images images/jpeg2000

Well, I know it's not the definitive source of anything, but the
wikipedia entry on JPEG2000 says:
The PNG (Portable Network Graphics) format is still more
space-efficient in the case of images with many pixels of the same
color, and supports special compression features that JPEG 2000 does
not. 

So would PNG be better?  It does support 16 bit greyscale.  Then again,
so does TIFF, and Mar already uses that.  Why don't they use the LZW
compression feature of TIFF?  The old Mar325 images were compressed
after all. I think only Mar can answer this, but I imagine the choice to
drop compression was because the advantages of compression (a factor or
2 or so in space) are outweighed by the disadvantages (limited speed and
limited compatibility with data processing packages).

How good could lossless compression of diffraction images possibly be?  
I just ran an entropy calculation on the 44968 images on /data at the
moment at ALS 8.3.1.  I am using a feature of Andy Hammersley's program
FIT2D to compute the entropy.  I don't pretend to completely
understand the algorithm, but I do understand that the entropy of the
image reflects the maximum possible compression ratio.  For these
images, the theoretical maximum compression ratio ranged from 1.2 to
4.8 with mean 2.7 and standard deviation 0.7.  The values for Huffmann
encoding ranged from 0.95 to 4.7 with mean 2.4 and standard deviation
1.0.  The correlation coefficient between the Huffmann and theoretical

compression ratio was 0.97.  I had a look at a few of the outlier cases.
As one might expect, the best compression ratios are from blank images
(where all the high-order bits are zero).  The #1 hardest-to-compress
image had many overloads, clear protein diffraction and a bunch of ice
rings. 

So, unless I am missing something, I think the best we are going to get
with lossless compression is about 2.5:1.  At least, for individual
frames.  Compressing a data set as a video sequence might have
substantial gains since only a few pixels change significantly from
frame-to-frame.  Are there any lossless video codecs out there?  If so,
can they handle 6144x6144 video?

  What about lossy compression?  Yes yes, I know it sounds like a
horrible idea to use lossy compression on scientific data, because it
would change the values of that most precious of numbers: Fobs.  
However, the question I have never heard a good answer to is HOW MUCH
would it change Fobs?  More practically: how much compression can you do
before Fobs changes by more than the value of SIGFobs?  Diffraction
patterns are inherently noisy.  If you take the same image twice, then
photon counting statistics make sure that no two images are exactly the
same.  So which one is right?  If the changes in pixel values from a
lossy compression algorithm are always

Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-24 Thread Winter, G (Graeme)
Hi James,

The old mar345 images were compressed with the pack which Bill is
referring to. This is suppoprted in CBFlib.

PNG and jpeg2000 may well do better at compression (would like to see
the numbers with this) but are likely to be much slower than something
customised for use with diffraction images. Anything doing complex
mathematical analysis is likely to be slow...

On an example set packed with the scripts in another email I got a
compression ratio with bzip2 of 3.49:1 for 270 frames. This exceeds the
value you quote below, but was from images where some of the detector
was unused, where the packing would probably work better. 

On the question of lossy compression, I think we'd have to ask some data
reduction guru's how much the noise would affect the data reduction. I
suspect that the main problem is that the noise added would be
correlated across the image and would therefore affect the background
statistics in a non-trivial way. Although the intensity measurements may
not be badly affected the error estimates on them could be...

Cheers,

Graeme

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
James Holton
Sent: 23 August 2007 18:47
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] diffraction images images/jpeg2000

Well, I know it's not the definitive source of anything, but the
wikipedia entry on JPEG2000 says:
The PNG (Portable Network Graphics) format is still more
space-efficient in the case of images with many pixels of the same
color, and supports special compression features that JPEG 2000 does
not. 

So would PNG be better?  It does support 16 bit greyscale.  Then again,
so does TIFF, and Mar already uses that.  Why don't they use the LZW
compression feature of TIFF?  The old Mar325 images were compressed
after all. I think only Mar can answer this, but I imagine the choice to
drop compression was because the advantages of compression (a factor or
2 or so in space) are outweighed by the disadvantages (limited speed and
limited compatibility with data processing packages).

How good could lossless compression of diffraction images possibly be?  
I just ran an entropy calculation on the 44968 images on /data at the
moment at ALS 8.3.1.  I am using a feature of Andy Hammersley's program
FIT2D to compute the entropy.  I don't pretend to completely
understand the algorithm, but I do understand that the entropy of the
image reflects the maximum possible compression ratio.  For these
images, the theoretical maximum compression ratio ranged from 1.2 to
4.8 with mean 2.7 and standard deviation 0.7.  The values for Huffmann
encoding ranged from 0.95 to 4.7 with mean 2.4 and standard deviation
1.0.  The correlation coefficient between the Huffmann and theoretical

compression ratio was 0.97.  I had a look at a few of the outlier cases.
As one might expect, the best compression ratios are from blank images
(where all the high-order bits are zero).  The #1 hardest-to-compress
image had many overloads, clear protein diffraction and a bunch of ice
rings. 

So, unless I am missing something, I think the best we are going to get
with lossless compression is about 2.5:1.  At least, for individual
frames.  Compressing a data set as a video sequence might have
substantial gains since only a few pixels change significantly from
frame-to-frame.  Are there any lossless video codecs out there?  If so,
can they handle 6144x6144 video?

  What about lossy compression?  Yes yes, I know it sounds like a
horrible idea to use lossy compression on scientific data, because it
would change the values of that most precious of numbers: Fobs.  
However, the question I have never heard a good answer to is HOW MUCH
would it change Fobs?  More practically: how much compression can you do
before Fobs changes by more than the value of SIGFobs?  Diffraction
patterns are inherently noisy.  If you take the same image twice, then
photon counting statistics make sure that no two images are exactly the
same.  So which one is right?  If the changes in pixel values from a
lossy compression algorithm are always smaller than that introduced by
photon-counting noise, then is lossy compression really such a bad idea?
The errors introduced could be small when compared to errors in say,
scale factors or bulk solvent parameters.  A great deal can be gained in
compression ratio if only random noise is removed.  I remember the
days before MP3 when it was lamented that sampled audio files could
never be compressed very well.  Even today bzip2 does not work very well
at all at compressing sampled audio (about 1.3:1), but
mp3 files can be made at a compression ratio of 10:1 over CD-quality
audio and we all seem to still enjoy the music.

I suppose the best lossy compression is the one that preserves the
features of the image you want and throws out the stuff you don't care
about.  So, in a way, data-reduction programs are probably the best
lossy compression we are going to get.  Unfortunately

Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-24 Thread Harry Powell

Hi

Lossy compression should be okay, provided that the errors introduced are 
smaller than those expected for counting statistics (assuming that the 
pixels are more-or-less independent) - i.e. less than the square-root of 
the individual pixel intensities (though I don't see why this can't be 
extended to the integrated reflection intensities). So it's more important 
to accurately retain your weak pixel values than your strong ones - an 
error of ±10 for a pixel in a background count where the background should 
be 40 is significant, but an error of ±10 for a saturated pixel on most 
detectors (say, about 64K for a CCD) wouldn't affect anything.



On the question of lossy compression, I think we'd have to ask some data
reduction guru's how much the noise would affect the data reduction. I
suspect that the main problem is that the noise added would be
correlated across the image and would therefore affect the background
statistics in a non-trivial way. Although the intensity measurements may
not be badly affected the error estimates on them could be...


Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills
Road, Cambridge, CB2 2QH


Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-24 Thread Gerard Bricogne
Dear all,

 I think we need to stop and think right here. The errors in pixel
values of images are neither Poisson (i.e. forget about taking square roots)
nor independent. Our ideas about image statistics are already disastrously
poor enough: the last thing we need is to make matters even worse by using
compression methods based on those erroneous statistical arguments!


 With best wishes,
 
  Gerard.

--
On Fri, Aug 24, 2007 at 01:20:29PM +0100, Harry Powell wrote:
 Hi
 
 Lossy compression should be okay, provided that the errors introduced are 
 smaller than those expected for counting statistics (assuming that the 
 pixels are more-or-less independent) - i.e. less than the square-root of 
 the individual pixel intensities (though I don't see why this can't be 
 extended to the integrated reflection intensities). So it's more important 
 to accurately retain your weak pixel values than your strong ones - an 
 error of ±10 for a pixel in a background count where the background should 
 be 40 is significant, but an error of ±10 for a saturated pixel on most 
 detectors (say, about 64K for a CCD) wouldn't affect anything.
 
 On the question of lossy compression, I think we'd have to ask some data
 reduction guru's how much the noise would affect the data reduction. I
 suspect that the main problem is that the noise added would be
 correlated across the image and would therefore affect the background
 statistics in a non-trivial way. Although the intensity measurements may
 not be badly affected the error estimates on them could be...
 
 Harry
 -- 
 Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills
 Road, Cambridge, CB2 2QH


-- 

 ===
 * *
 * Gerard Bricogne [EMAIL PROTECTED]  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-24 Thread Harry Powell


Wow.

I don't know about the rest of you, but I got told three times.

Gerard is, of course, right about pixel non-independence (think point 
spread function, among other things), and I wouldn't care to argue 
statistics with him, but as far as I know (and I could well be wrong) most 
of the integration programs out there _do_ use counting statistics (i.e. 
Poisson statistics) at least as a first approximation for the random error 
in measurement; this may be modified by some detector inefficiency 
factor (See Borek, Minor  Otwinowski, Acta Cryst (2003) D59 2031 - 
2038), but it's still there and being used by everyone, nonetheless.


Having said that, regarding the storage of images, my personal feeling is 
that there's no real point in using a lossy compression when there are 
good lossless systems out there. I also think that almost no-one would 
ever bother to reprocess deposited images anyway; my guess is that 
unusual structures would be detected by other means, and that examining 
the original images would rarely shed light on the problem.



I think we need to stop and think right here. The errors in pixel
values of images are neither Poisson (i.e. forget about taking square roots)
nor independent. Our ideas about image statistics are already disastrously
poor enough: the last thing we need is to make matters even worse by using
compression methods based on those erroneous statistical arguments!


With best wishes,

 Gerard.

--
On Fri, Aug 24, 2007 at 01:20:29PM +0100, Harry Powell wrote:

Hi

Lossy compression should be okay, provided that the errors introduced are
smaller than those expected for counting statistics (assuming that the
pixels are more-or-less independent) - i.e. less than the square-root of
the individual pixel intensities (though I don't see why this can't be
extended to the integrated reflection intensities). So it's more important
to accurately retain your weak pixel values than your strong ones - an
error of ±10 for a pixel in a background count where the background should
be 40 is significant, but an error of ±10 for a saturated pixel on most
detectors (say, about 64K for a CCD) wouldn't affect anything.


On the question of lossy compression, I think we'd have to ask some data
reduction guru's how much the noise would affect the data reduction. I
suspect that the main problem is that the noise added would be
correlated across the image and would therefore affect the background
statistics in a non-trivial way. Although the intensity measurements may
not be badly affected the error estimates on them could be...


Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills
Road, Cambridge, CB2 2QH



--

===
* *
* Gerard Bricogne [EMAIL PROTECTED]  *
* *
* Global Phasing Ltd. *
* Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
* Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
* *
===



Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills
Road, Cambridge, CB2 2QH


Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-23 Thread James Holton
 
about your data collection and you wouldn't need to come up with 
inventive tags for data items that might be required for other 
(general purpose) image formats.


There are even conversion programs available to convert to imgCIF/CBF 
files from some native formats - if your favourite detector isn't one 
of these, drop Herb Bernstein a line and ask for support ;-)



I looked at jpeg2000 as a compression for diffraction images for
archiving purposes - it works well but is *SLOW*. It's designed with the
idea in mind of compressing a single image, not the several hundred
typical for our work. There is also no place to put the header.

Bzip2 works pretty much as well and is standard, but again slow. This is
what people mostly seem to use for putting diffraction images on the
web, particularly the JCSG.

The ccp4 pack format which has been around for a very long time works
very well and is jolly quick, and is supported in a number of data
processing packages natively (Mosflm, XDS). Likewise there is a new
compression being used for the Pilatus detector which is quicker again.
These two have the advantage of being designed for diffraction images
and with speed in mind.

So there are plenty of good compression schemes out there - and if you
use CBF these can be supported natively in the image standard... So you
don't even need to know or care...

Just my 2c on this one.

Cheers,

Graeme

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Maneesh Yadav
Sent: 18 August 2007 00:02
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] diffraction images images/jpeg2000

FWIW, I don't agree with storing image data, I don't think they justify
the cost of storage even remotely (some people debate the value of the
structures themselves)...but if you want to do it anyway, maybe we
should use a format like jpeg2000.

Last time I checked, none of the major image processing suites used it,
but it is a very impressive and mature format that (I think) would be
suitable for diffraction images.  If anyone is up for experimenting, you
can get a nice suite of tools from kakadu (just google kakdu +
jpeg2000).



Harry


Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-21 Thread William Scott
 ccp4 J. P. Abrahams pack_c.c compression offers.  At the

I used this when I was a postdoc but had forgotten about this.  It doesn't 
build (?) as far as I can tell in the default ccp4 install. I found it, 
and a fortran program, in the ipdisp directory, tried make and got this 
rather cryptic message:

You need to make mosflm-bits in the library for the image-packing stuff
exit 1
make: *** [/sw/share/xtal/ccp4-6.0.2/lib/libccp4.a(pack_c.o)] Error 1

I can't find anything called mosflm-bits.  This bytes.  I can't find
any documentation either.

Bill


 


Re: [ccp4bb] diffraction images images/jpeg2000

2007-08-20 Thread Winter, G (Graeme)
Hi,

I looked at jpeg2000 as a compression for diffraction images for
archiving purposes - it works well but is *SLOW*. It's designed with the
idea in mind of compressing a single image, not the several hundred
typical for our work. There is also no place to put the header.

Bzip2 works pretty much as well and is standard, but again slow. This is
what people mostly seem to use for putting diffraction images on the
web, particularly the JCSG.

The ccp4 pack format which has been around for a very long time works
very well and is jolly quick, and is supported in a number of data
processing packages natively (Mosflm, XDS). Likewise there is a new
compression being used for the Pilatus detector which is quicker again.
These two have the advantage of being designed for diffraction images
and with speed in mind.

So there are plenty of good compression schemes out there - and if you
use CBF these can be supported natively in the image standard... So you
don't even need to know or care...

Just my 2c on this one.

Cheers,

Graeme

-Original Message-
From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of
Maneesh Yadav
Sent: 18 August 2007 00:02
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] diffraction images images/jpeg2000

FWIW, I don't agree with storing image data, I don't think they justify
the cost of storage even remotely (some people debate the value of the
structures themselves)...but if you want to do it anyway, maybe we
should use a format like jpeg2000.

Last time I checked, none of the major image processing suites used it,
but it is a very impressive and mature format that (I think) would be
suitable for diffraction images.  If anyone is up for experimenting, you
can get a nice suite of tools from kakadu (just google kakdu +
jpeg2000).


[ccp4bb] diffraction images images/jpeg2000

2007-08-17 Thread Maneesh Yadav
FWIW, I don't agree with storing image data, I don't think they justify the 
cost of storage even remotely (some people debate the value of the structures 
themselves)...but if you want to do it anyway, maybe we should use a format 
like jpeg2000.

Last time I checked, none of the major image processing suites used it, but it 
is a very impressive and mature format that (I think) would be suitable for 
diffraction images.  If anyone is up for experimenting, you can get a nice 
suite of tools from kakadu (just google kakdu + jpeg2000).