[ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Ian Tickle
All -

No doubt this topic has come up before on the BB: I'd like to ask
about the current capabilities of the various integration programs (in
practice we use only MOSFLM  XDS) for reading compressed diffraction
images from synchrotrons.  AFAICS XDS has limited support for reading
compressed images (TIFF format from the MARCCD detector and CCP4
compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
seem to support reading compressed images at all (I'm sure Harry will
correct me if I'm wrong about this!).  I'm really thinking about
gzipped files here: bzip2 no doubt gives marginally smaller files but
is very slow.  Currently we bring back uncompressed images but it
seems to me that this is not the most efficient way of doing things -
or is it just that my expectation that it's more efficient to read
compressed images and uncompress in memory not realised in practice?
For example the AstexViewer molecular viewer software currently reads
gzipped CCP4 maps directly and gunzips them in memory; this improves
the response time by a modest factor of ~ 1.5, but this is because
electron density maps are 'dense' from a compression point of view;
X-ray diffraction images tend to have much more 'empty space' and the
compression factor is usually considerably higher (as much as
10-fold).

On a recent trip we collected more data than we anticipated  the
uncompressed data no longer fitted on our USB disk (the data is backed
up to the USB disk as it's collected), so we would have definitely
benefited from compression!  However file size is *not* the issue:
disk space is cheap after all.  My point is that compressed images
surely require much less disk I/O to read.  In this respect bringing
back compressed images and then uncompressing back to a local disk
completely defeats the object of compression - you actually more than
double the I/O instead of reducing it!  We see this when we try to
process the ~150 datasets that we bring back on our PC cluster and the
disk I/O completely cripples the disk server machine (and everyone
who's trying to use it at the same time!) unless we're careful to
limit the number of simultaneous jobs.  When we routinely start to use
the Pilatus detector on the beamlines this is going to be even more of
an issue.  Basically we have plenty of processing power from the
cluster: the disk I/O is the bottleneck.  Now you could argue that we
should spread the load over more disks or maybe spend more on faster
disk controllers, but the whole point about disks is they're cheap, we
don't need the extra I/O bandwidth for anything else, and you
shouldn't need to spend a fortune, particularly if there are ways of
making the software more efficient, which after all will benefit
everyone.

Cheers

-- Ian


Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Jim Pflugrath
d*TREK will process compressed images with the following extensions: .gz
.bz2 .Z .pck and .cbf
 

-Original Message-
From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian
Tickle
Sent: Thursday, May 06, 2010 6:25 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] Processing compressed diffraction images?

All -

No doubt this topic has come up before on the BB: I'd like to ask about the
current capabilities of the various integration programs (in practice we use
only MOSFLM  XDS) for reading compressed diffraction images from
synchrotrons.  A...

cluster: the disk I/O is the bottleneck.  Now you could argue that we should
spread the load over more disks or maybe spend more on faster disk
controllers, but the whole point about disks is they're cheap, we don't need
the extra I/O bandwidth for anything else, and you shouldn't need to spend a
fortune, particularly if there are ways of making the software more
efficient, which after all will benefit everyone.

Cheers

-- Ian


Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Tim Gruene
Entering xds gzip at www.ixquick.com came up with
http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html:

To save space it is allowed to compress the images by using the UNIX compress,
gzip, or bzip2 routines. On data processing XDS will automatically recognize and
expand the compressed images files. The file name extensions (.Z, .z, .gz, bz2)
due to the compression routines should not be included in the generic file name
template. 

I thought to remember that mosflm also supports gzipped images but didn't find a
reference within 2 minutes.

I'm surprised to hear that you get such a high compression rate with mccd
images. 

Cheers, Tim


On Thu, May 06, 2010 at 12:24:47PM +0100, Ian Tickle wrote:
 All -
 
 No doubt this topic has come up before on the BB: I'd like to ask
 about the current capabilities of the various integration programs (in
 practice we use only MOSFLM  XDS) for reading compressed diffraction
 images from synchrotrons.  AFAICS XDS has limited support for reading
 compressed images (TIFF format from the MARCCD detector and CCP4
 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
 seem to support reading compressed images at all (I'm sure Harry will
 correct me if I'm wrong about this!).  I'm really thinking about
 gzipped files here: bzip2 no doubt gives marginally smaller files but
 is very slow.  Currently we bring back uncompressed images but it
 seems to me that this is not the most efficient way of doing things -
 or is it just that my expectation that it's more efficient to read
 compressed images and uncompress in memory not realised in practice?
 For example the AstexViewer molecular viewer software currently reads
 gzipped CCP4 maps directly and gunzips them in memory; this improves
 the response time by a modest factor of ~ 1.5, but this is because
 electron density maps are 'dense' from a compression point of view;
 X-ray diffraction images tend to have much more 'empty space' and the
 compression factor is usually considerably higher (as much as
 10-fold).
 
 On a recent trip we collected more data than we anticipated  the
 uncompressed data no longer fitted on our USB disk (the data is backed
 up to the USB disk as it's collected), so we would have definitely
 benefited from compression!  However file size is *not* the issue:
 disk space is cheap after all.  My point is that compressed images
 surely require much less disk I/O to read.  In this respect bringing
 back compressed images and then uncompressing back to a local disk
 completely defeats the object of compression - you actually more than
 double the I/O instead of reducing it!  We see this when we try to
 process the ~150 datasets that we bring back on our PC cluster and the
 disk I/O completely cripples the disk server machine (and everyone
 who's trying to use it at the same time!) unless we're careful to
 limit the number of simultaneous jobs.  When we routinely start to use
 the Pilatus detector on the beamlines this is going to be even more of
 an issue.  Basically we have plenty of processing power from the
 cluster: the disk I/O is the bottleneck.  Now you could argue that we
 should spread the load over more disks or maybe spend more on faster
 disk controllers, but the whole point about disks is they're cheap, we
 don't need the extra I/O bandwidth for anything else, and you
 shouldn't need to spend a fortune, particularly if there are ways of
 making the software more efficient, which after all will benefit
 everyone.
 
 Cheers
 
 -- Ian

-- 
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A



signature.asc
Description: Digital signature


Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Ian Tickle
Jim, thanks for the info.  At present we use d*TREK mostly only for
in-house data (Saturn, Jupiter  R-axis) so the data collection rate
is much lower and in any case we would gain nothing by compressing
them since the I/O is the same whether it's gzip reading in the images
or d*TREK.  Our problem is that our people bring back a large no of
datasets ( 150) from each synchrotron trip, dump them all on the file
server and then try to process them all at the same time!

Cheers

-- Ian

On Thu, May 6, 2010 at 12:38 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote:
 d*TREK will process compressed images with the following extensions: .gz
 .bz2 .Z .pck and .cbf


 -Original Message-
 From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian
 Tickle
 Sent: Thursday, May 06, 2010 6:25 AM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: [ccp4bb] Processing compressed diffraction images?

 All -

 No doubt this topic has come up before on the BB: I'd like to ask about the
 current capabilities of the various integration programs (in practice we use
 only MOSFLM  XDS) for reading compressed diffraction images from
 synchrotrons.  A...

 cluster: the disk I/O is the bottleneck.  Now you could argue that we should
 spread the load over more disks or maybe spend more on faster disk
 controllers, but the whole point about disks is they're cheap, we don't need
 the extra I/O bandwidth for anything else, and you shouldn't need to spend a
 fortune, particularly if there are ways of making the software more
 efficient, which after all will benefit everyone.

 Cheers

 -- Ian




Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Harry Powell
Hi Ian

I've looked briefly at implementing gunzip in Mosflm  in the past, but never 
really pursued it. It could probably be done when I have some free time, but 
who knows when that will be? gzip'ing one of my standard test sets gives around 
a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the compression 
is important too, and is considerably slower than uncompressing (since  with 
uncompressing you know where you are going and have the instructions, whereas 
with compressing you have to find it all out as you proceed).

There are several ways of writing compressed images that (I believe) all the 
major processing packages have implemented - for example, Jan Pieter Abrahams 
has one which has been used for Mar images for a long time, and CBF has more 
than one. There are very good reasons for all detectors to write their images 
using CBFs with some kind of compression (I think that all new MX detectors at 
Diamond, for example, are required to be able to). 

Pilatus images are written using a fast compressor and read (in Mosflm and XDS, 
anyway - I have no idea about d*Trek or HKL, but imagine they would do the job 
every bit as well) using a fast decompressor - so this goes some way towards 
dealing with that particular problem - the image files aren't as big as you'd 
expect from their physical size and 20-bit dynamic range (from the 6M they're 
roughly 6MB, rather than 6MB * 2.5). So that seems about as good as you'd get 
from bzip2 anyway.

I'd be somewhat surprised to see a non-lossy fast algorithm that could give you 
10-fold compression with normal MX type images - the empty space between 
Bragg maxima is full of detail (noise, diffuse scatter). If you had a truly 
flat background you could get much better compression, of course. 

On 6 May 2010, at 11:24, Ian Tickle wrote:

 All -
 
 No doubt this topic has come up before on the BB: I'd like to ask
 about the current capabilities of the various integration programs (in
 practice we use only MOSFLM  XDS) for reading compressed diffraction
 images from synchrotrons.  AFAICS XDS has limited support for reading
 compressed images (TIFF format from the MARCCD detector and CCP4
 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
 seem to support reading compressed images at all (I'm sure Harry will
 correct me if I'm wrong about this!).  I'm really thinking about
 gzipped files here: bzip2 no doubt gives marginally smaller files but
 is very slow.  Currently we bring back uncompressed images but it
 seems to me that this is not the most efficient way of doing things -
 or is it just that my expectation that it's more efficient to read
 compressed images and uncompress in memory not realised in practice?
 For example the AstexViewer molecular viewer software currently reads
 gzipped CCP4 maps directly and gunzips them in memory; this improves
 the response time by a modest factor of ~ 1.5, but this is because
 electron density maps are 'dense' from a compression point of view;
 X-ray diffraction images tend to have much more 'empty space' and the
 compression factor is usually considerably higher (as much as
 10-fold).
 
 On a recent trip we collected more data than we anticipated  the
 uncompressed data no longer fitted on our USB disk (the data is backed
 up to the USB disk as it's collected), so we would have definitely
 benefited from compression!  However file size is *not* the issue:
 disk space is cheap after all.  My point is that compressed images
 surely require much less disk I/O to read.  In this respect bringing
 back compressed images and then uncompressing back to a local disk
 completely defeats the object of compression - you actually more than
 double the I/O instead of reducing it!  We see this when we try to
 process the ~150 datasets that we bring back on our PC cluster and the
 disk I/O completely cripples the disk server machine (and everyone
 who's trying to use it at the same time!) unless we're careful to
 limit the number of simultaneous jobs.  When we routinely start to use
 the Pilatus detector on the beamlines this is going to be even more of
 an issue.  Basically we have plenty of processing power from the
 cluster: the disk I/O is the bottleneck.  Now you could argue that we
 should spread the load over more disks or maybe spend more on faster
 disk controllers, but the whole point about disks is they're cheap, we
 don't need the extra I/O bandwidth for anything else, and you
 shouldn't need to spend a fortune, particularly if there are ways of
 making the software more efficient, which after all will benefit
 everyone.
 
 Cheers
 
 -- Ian

Harry
--
Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, 
Cambridge, CB2 0QH


Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Ian Tickle
Hi Tim thanks for that, sorry yes I missed that page.  But I'm still
not clear: is it uncompressing to disk or is it doing it in memory?  I
assume the latter: if the former then obviously nothing is gained.
You're right about the compression factor, it's more like a factor of
2 or 3, I should have looked at the image in question as the one I
picked had no spots!

Cheers

-- Iam

On Thu, May 6, 2010 at 12:54 PM, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote:
 Entering xds gzip at www.ixquick.com came up with
 http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html:

 To save space it is allowed to compress the images by using the UNIX 
 compress,
 gzip, or bzip2 routines. On data processing XDS will automatically recognize 
 and
 expand the compressed images files. The file name extensions (.Z, .z, .gz, 
 bz2)
 due to the compression routines should not be included in the generic file 
 name
 template. 

 I thought to remember that mosflm also supports gzipped images but didn't 
 find a
 reference within 2 minutes.

 I'm surprised to hear that you get such a high compression rate with mccd
 images.

 Cheers, Tim


 On Thu, May 06, 2010 at 12:24:47PM +0100, Ian Tickle wrote:
 All -

 No doubt this topic has come up before on the BB: I'd like to ask
 about the current capabilities of the various integration programs (in
 practice we use only MOSFLM  XDS) for reading compressed diffraction
 images from synchrotrons.  AFAICS XDS has limited support for reading
 compressed images (TIFF format from the MARCCD detector and CCP4
 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
 seem to support reading compressed images at all (I'm sure Harry will
 correct me if I'm wrong about this!).  I'm really thinking about
 gzipped files here: bzip2 no doubt gives marginally smaller files but
 is very slow.  Currently we bring back uncompressed images but it
 seems to me that this is not the most efficient way of doing things -
 or is it just that my expectation that it's more efficient to read
 compressed images and uncompress in memory not realised in practice?
 For example the AstexViewer molecular viewer software currently reads
 gzipped CCP4 maps directly and gunzips them in memory; this improves
 the response time by a modest factor of ~ 1.5, but this is because
 electron density maps are 'dense' from a compression point of view;
 X-ray diffraction images tend to have much more 'empty space' and the
 compression factor is usually considerably higher (as much as
 10-fold).

 On a recent trip we collected more data than we anticipated  the
 uncompressed data no longer fitted on our USB disk (the data is backed
 up to the USB disk as it's collected), so we would have definitely
 benefited from compression!  However file size is *not* the issue:
 disk space is cheap after all.  My point is that compressed images
 surely require much less disk I/O to read.  In this respect bringing
 back compressed images and then uncompressing back to a local disk
 completely defeats the object of compression - you actually more than
 double the I/O instead of reducing it!  We see this when we try to
 process the ~150 datasets that we bring back on our PC cluster and the
 disk I/O completely cripples the disk server machine (and everyone
 who's trying to use it at the same time!) unless we're careful to
 limit the number of simultaneous jobs.  When we routinely start to use
 the Pilatus detector on the beamlines this is going to be even more of
 an issue.  Basically we have plenty of processing power from the
 cluster: the disk I/O is the bottleneck.  Now you could argue that we
 should spread the load over more disks or maybe spend more on faster
 disk controllers, but the whole point about disks is they're cheap, we
 don't need the extra I/O bandwidth for anything else, and you
 shouldn't need to spend a fortune, particularly if there are ways of
 making the software more efficient, which after all will benefit
 everyone.

 Cheers

 -- Ian

 --
 --
 Tim Gruene
 Institut fuer anorganische Chemie
 Tammannstr. 4
 D-37077 Goettingen

 GPG Key ID = A46BEE1A


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iD8DBQFL4q3xUxlJ7aRr7hoRAibGAKDJvFsy+GUZQ3E/tqQMVovkJxPTRACgoSjb
 QaVZzpgtXv4IUTx5Kt8d5eM=
 =OvRA
 -END PGP SIGNATURE-




Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Ian Tickle
Hi Harry

Thanks for the info.  Speed of compression is not an issue I think
since compression  backing up of the images are done asynchronously
with data collection, and currently backing up easily keeps up, so I
think compression straight to the backup disk would too.  As you saw
from my reply to Tim my compression factor of 10 was a bit optimistic,
for images with spots on them (!) it's more like 2 or 3 with gzip, as
you say.

I found an old e-mail from James Holton where he suggested lossy
compression for diffraction images (as long as it didn't change the
F's significantly!) - I'm not sure whether anything came of that!

Cheers

-- Ian

On Thu, May 6, 2010 at 2:04 PM, Harry Powell ha...@mrc-lmb.cam.ac.uk wrote:
 Hi Ian

 I've looked briefly at implementing gunzip in Mosflm  in the past, but never 
 really pursued it. It could probably be done when I have some free time, but 
 who knows when that will be? gzip'ing one of my standard test sets gives 
 around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the 
 compression is important too, and is considerably slower than uncompressing 
 (since  with uncompressing you know where you are going and have the 
 instructions, whereas with compressing you have to find it all out as you 
 proceed).

 There are several ways of writing compressed images that (I believe) all the 
 major processing packages have implemented - for example, Jan Pieter Abrahams 
 has one which has been used for Mar images for a long time, and CBF has more 
 than one. There are very good reasons for all detectors to write their images 
 using CBFs with some kind of compression (I think that all new MX detectors 
 at Diamond, for example, are required to be able to).

 Pilatus images are written using a fast compressor and read (in Mosflm and 
 XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do 
 the job every bit as well) using a fast decompressor - so this goes some way 
 towards dealing with that particular problem - the image files aren't as big 
 as you'd expect from their physical size and 20-bit dynamic range (from the 
 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good 
 as you'd get from bzip2 anyway.

 I'd be somewhat surprised to see a non-lossy fast algorithm that could give 
 you 10-fold compression with normal MX type images - the empty space 
 between Bragg maxima is full of detail (noise, diffuse scatter). If you 
 had a truly flat background you could get much better compression, of course.

 On 6 May 2010, at 11:24, Ian Tickle wrote:

 All -

 No doubt this topic has come up before on the BB: I'd like to ask
 about the current capabilities of the various integration programs (in
 practice we use only MOSFLM  XDS) for reading compressed diffraction
 images from synchrotrons.  AFAICS XDS has limited support for reading
 compressed images (TIFF format from the MARCCD detector and CCP4
 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
 seem to support reading compressed images at all (I'm sure Harry will
 correct me if I'm wrong about this!).  I'm really thinking about
 gzipped files here: bzip2 no doubt gives marginally smaller files but
 is very slow.  Currently we bring back uncompressed images but it
 seems to me that this is not the most efficient way of doing things -
 or is it just that my expectation that it's more efficient to read
 compressed images and uncompress in memory not realised in practice?
 For example the AstexViewer molecular viewer software currently reads
 gzipped CCP4 maps directly and gunzips them in memory; this improves
 the response time by a modest factor of ~ 1.5, but this is because
 electron density maps are 'dense' from a compression point of view;
 X-ray diffraction images tend to have much more 'empty space' and the
 compression factor is usually considerably higher (as much as
 10-fold).

 On a recent trip we collected more data than we anticipated  the
 uncompressed data no longer fitted on our USB disk (the data is backed
 up to the USB disk as it's collected), so we would have definitely
 benefited from compression!  However file size is *not* the issue:
 disk space is cheap after all.  My point is that compressed images
 surely require much less disk I/O to read.  In this respect bringing
 back compressed images and then uncompressing back to a local disk
 completely defeats the object of compression - you actually more than
 double the I/O instead of reducing it!  We see this when we try to
 process the ~150 datasets that we bring back on our PC cluster and the
 disk I/O completely cripples the disk server machine (and everyone
 who's trying to use it at the same time!) unless we're careful to
 limit the number of simultaneous jobs.  When we routinely start to use
 the Pilatus detector on the beamlines this is going to be even more of
 an issue.  Basically we have plenty of processing power from the
 cluster: the disk I/O is the bottleneck.  Now 

Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Phil Evans
Compression methods such as gzip are unlikely to be optimum for diffraction 
images, and AFAIK the methods in CBF are better (I think Jim Pflugrath did some 
races a long time ago, and I guess others have too). There is no reason for 
data acquisition software ever to write uncompressed images (let alone having 
57 different ways of doing it)

Phil

On 6 May 2010, at 13:38, Ian Tickle wrote:

 Hi Harry
 
 Thanks for the info.  Speed of compression is not an issue I think
 since compression  backing up of the images are done asynchronously
 with data collection, and currently backing up easily keeps up, so I
 think compression straight to the backup disk would too.  As you saw
 from my reply to Tim my compression factor of 10 was a bit optimistic,
 for images with spots on them (!) it's more like 2 or 3 with gzip, as
 you say.
 
 I found an old e-mail from James Holton where he suggested lossy
 compression for diffraction images (as long as it didn't change the
 F's significantly!) - I'm not sure whether anything came of that!
 
 Cheers
 
 -- Ian
 
 On Thu, May 6, 2010 at 2:04 PM, Harry Powell ha...@mrc-lmb.cam.ac.uk wrote:
 Hi Ian
 
 I've looked briefly at implementing gunzip in Mosflm  in the past, but never 
 really pursued it. It could probably be done when I have some free time, but 
 who knows when that will be? gzip'ing one of my standard test sets gives 
 around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the 
 compression is important too, and is considerably slower than uncompressing 
 (since  with uncompressing you know where you are going and have the 
 instructions, whereas with compressing you have to find it all out as you 
 proceed).
 
 There are several ways of writing compressed images that (I believe) all the 
 major processing packages have implemented - for example, Jan Pieter 
 Abrahams has one which has been used for Mar images for a long time, and CBF 
 has more than one. There are very good reasons for all detectors to write 
 their images using CBFs with some kind of compression (I think that all new 
 MX detectors at Diamond, for example, are required to be able to).
 
 Pilatus images are written using a fast compressor and read (in Mosflm and 
 XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do 
 the job every bit as well) using a fast decompressor - so this goes some way 
 towards dealing with that particular problem - the image files aren't as big 
 as you'd expect from their physical size and 20-bit dynamic range (from the 
 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good 
 as you'd get from bzip2 anyway.
 
 I'd be somewhat surprised to see a non-lossy fast algorithm that could give 
 you 10-fold compression with normal MX type images - the empty space 
 between Bragg maxima is full of detail (noise, diffuse scatter). If you 
 had a truly flat background you could get much better compression, of course.
 
 On 6 May 2010, at 11:24, Ian Tickle wrote:
 
 All -
 
 No doubt this topic has come up before on the BB: I'd like to ask
 about the current capabilities of the various integration programs (in
 practice we use only MOSFLM  XDS) for reading compressed diffraction
 images from synchrotrons.  AFAICS XDS has limited support for reading
 compressed images (TIFF format from the MARCCD detector and CCP4
 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
 seem to support reading compressed images at all (I'm sure Harry will
 correct me if I'm wrong about this!).  I'm really thinking about
 gzipped files here: bzip2 no doubt gives marginally smaller files but
 is very slow.  Currently we bring back uncompressed images but it
 seems to me that this is not the most efficient way of doing things -
 or is it just that my expectation that it's more efficient to read
 compressed images and uncompress in memory not realised in practice?
 For example the AstexViewer molecular viewer software currently reads
 gzipped CCP4 maps directly and gunzips them in memory; this improves
 the response time by a modest factor of ~ 1.5, but this is because
 electron density maps are 'dense' from a compression point of view;
 X-ray diffraction images tend to have much more 'empty space' and the
 compression factor is usually considerably higher (as much as
 10-fold).
 
 On a recent trip we collected more data than we anticipated  the
 uncompressed data no longer fitted on our USB disk (the data is backed
 up to the USB disk as it's collected), so we would have definitely
 benefited from compression!  However file size is *not* the issue:
 disk space is cheap after all.  My point is that compressed images
 surely require much less disk I/O to read.  In this respect bringing
 back compressed images and then uncompressing back to a local disk
 completely defeats the object of compression - you actually more than
 double the I/O instead of reducing it!  We see this when we try to
 process the ~150 datasets that we bring 

Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Fischmann, Thierry
The results from compressing a diffraction image must vary quite a bit on a 
case by case basis.

I looked into it a long time ago using images from a few datasets from 2 
different projects. Compress was quite faster than gzip or bzip2 in these 
tests. It also delivered the less compression. gzip and bzip2 were about the 
same speed (or lack thereof). But while the difference in speed was marginal 
bzip2 delivered a 20-30% size improvement over gzip.

The tests were with images of diffracting crystals, the diffraction extending 
to the edge of the detector.

Regards,

Thierry

-Original Message-
From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian Tickle
Sent: Thursday, May 06, 2010 08:28 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Processing compressed diffraction images?

Hi Tim thanks for that, sorry yes I missed that page.  But I'm still
not clear: is it uncompressing to disk or is it doing it in memory?  I
assume the latter: if the former then obviously nothing is gained.
You're right about the compression factor, it's more like a factor of
2 or 3, I should have looked at the image in question as the one I
picked had no spots!

Cheers

-- Iam

On Thu, May 6, 2010 at 12:54 PM, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote:
 Entering xds gzip at www.ixquick.com came up with
 http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html:

 To save space it is allowed to compress the images by using the UNIX 
 compress,
 gzip, or bzip2 routines. On data processing XDS will automatically recognize 
 and
 expand the compressed images files. The file name extensions (.Z, .z, .gz, 
 bz2)
 due to the compression routines should not be included in the generic file 
 name
 template. 

 I thought to remember that mosflm also supports gzipped images but didn't 
 find a
 reference within 2 minutes.

 I'm surprised to hear that you get such a high compression rate with mccd
 images.

 Cheers, Tim


 On Thu, May 06, 2010 at 12:24:47PM +0100, Ian Tickle wrote:
 All -

 No doubt this topic has come up before on the BB: I'd like to ask
 about the current capabilities of the various integration programs (in
 practice we use only MOSFLM  XDS) for reading compressed diffraction
 images from synchrotrons.  AFAICS XDS has limited support for reading
 compressed images (TIFF format from the MARCCD detector and CCP4
 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
 seem to support reading compressed images at all (I'm sure Harry will
 correct me if I'm wrong about this!).  I'm really thinking about
 gzipped files here: bzip2 no doubt gives marginally smaller files but
 is very slow.  Currently we bring back uncompressed images but it
 seems to me that this is not the most efficient way of doing things -
 or is it just that my expectation that it's more efficient to read
 compressed images and uncompress in memory not realised in practice?
 For example the AstexViewer molecular viewer software currently reads
 gzipped CCP4 maps directly and gunzips them in memory; this improves
 the response time by a modest factor of ~ 1.5, but this is because
 electron density maps are 'dense' from a compression point of view;
 X-ray diffraction images tend to have much more 'empty space' and the
 compression factor is usually considerably higher (as much as
 10-fold).

 On a recent trip we collected more data than we anticipated  the
 uncompressed data no longer fitted on our USB disk (the data is backed
 up to the USB disk as it's collected), so we would have definitely
 benefited from compression!  However file size is *not* the issue:
 disk space is cheap after all.  My point is that compressed images
 surely require much less disk I/O to read.  In this respect bringing
 back compressed images and then uncompressing back to a local disk
 completely defeats the object of compression - you actually more than
 double the I/O instead of reducing it!  We see this when we try to
 process the ~150 datasets that we bring back on our PC cluster and the
 disk I/O completely cripples the disk server machine (and everyone
 who's trying to use it at the same time!) unless we're careful to
 limit the number of simultaneous jobs.  When we routinely start to use
 the Pilatus detector on the beamlines this is going to be even more of
 an issue.  Basically we have plenty of processing power from the
 cluster: the disk I/O is the bottleneck.  Now you could argue that we
 should spread the load over more disks or maybe spend more on faster
 disk controllers, but the whole point about disks is they're cheap, we
 don't need the extra I/O bandwidth for anything else, and you
 shouldn't need to spend a fortune, particularly if there are ways of
 making the software more efficient, which after all will benefit
 everyone.

 Cheers

 -- Ian

 --
 --
 Tim Gruene
 Institut fuer anorganische Chemie
 Tammannstr. 4
 D-37077 Goettingen

 GPG Key ID = A46BEE1A


 -BEGIN PGP SIGNATURE-
 Version

[ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread Lepore, Bryan
yet another compressor to consider : xz

http://tukaani.org/xz/

hth


Re: [ccp4bb] Processing compressed diffraction images?

2010-05-06 Thread James Holton
Something I have been playing with recently that might address your 
problem in a way you like is SquashFS:

http://squashfs.sourceforge.net/

SquashFS is a read-only compressed file system.  It uses gzip --best, 
which is comparable to bzip2 for diffraction images (in my experience).  
Basically, it works a lot like burning to a CD.  You run mksquashfs to 
create the compressed image and then mount -o loop it.  Then voila!  
You can access everything in the archive as if it were an uncompressed 
file.  Disk I/O then consists of compressed data (decompression is done 
by the kernel), and so does network traffic if you play a clever trick: 
share the compressed file over NFS and mount -o loop it locally.  This 
has much bigger advantages than you might realize because most of the 
NFS traffic that brings a file server to its knees are the tiny little 
writes that are done to update access times.  NFS writes (and RAID 
writes) are all really expensive, and you can actually gain a 
considerable performance increase by just mounting your data disks 
read-only (or by putting noatime as a mount option).


Anyway, SquashFS is not as slick as the transparent compression you can 
get with HFS or NTFS, but I personally like the fact that it is 
read-only (good for data).  For real-time backup, mksquashfs does 
support appending to an existing archive, so you can probably build 
your squashfs file on the usb disk at the beamline (even if the beamline 
computer kernels can't mount it).  However, if you MUST have your 
processing files mixed amongst your images, you can use unionfs to 
overlay a writable file system with the read-only one.  Depends on how 
cooperative your IT guys are...


-James Holton
MAD Scientist

Ian Tickle wrote:

All -

No doubt this topic has come up before on the BB: I'd like to ask
about the current capabilities of the various integration programs (in
practice we use only MOSFLM  XDS) for reading compressed diffraction
images from synchrotrons.  AFAICS XDS has limited support for reading
compressed images (TIFF format from the MARCCD detector and CCP4
compressed format from the Oxford Diffraction CCD); MOSFLM doesn't
seem to support reading compressed images at all (I'm sure Harry will
correct me if I'm wrong about this!).  I'm really thinking about
gzipped files here: bzip2 no doubt gives marginally smaller files but
is very slow.  Currently we bring back uncompressed images but it
seems to me that this is not the most efficient way of doing things -
or is it just that my expectation that it's more efficient to read
compressed images and uncompress in memory not realised in practice?
For example the AstexViewer molecular viewer software currently reads
gzipped CCP4 maps directly and gunzips them in memory; this improves
the response time by a modest factor of ~ 1.5, but this is because
electron density maps are 'dense' from a compression point of view;
X-ray diffraction images tend to have much more 'empty space' and the
compression factor is usually considerably higher (as much as
10-fold).

On a recent trip we collected more data than we anticipated  the
uncompressed data no longer fitted on our USB disk (the data is backed
up to the USB disk as it's collected), so we would have definitely
benefited from compression!  However file size is *not* the issue:
disk space is cheap after all.  My point is that compressed images
surely require much less disk I/O to read.  In this respect bringing
back compressed images and then uncompressing back to a local disk
completely defeats the object of compression - you actually more than
double the I/O instead of reducing it!  We see this when we try to
process the ~150 datasets that we bring back on our PC cluster and the
disk I/O completely cripples the disk server machine (and everyone
who's trying to use it at the same time!) unless we're careful to
limit the number of simultaneous jobs.  When we routinely start to use
the Pilatus detector on the beamlines this is going to be even more of
an issue.  Basically we have plenty of processing power from the
cluster: the disk I/O is the bottleneck.  Now you could argue that we
should spread the load over more disks or maybe spend more on faster
disk controllers, but the whole point about disks is they're cheap, we
don't need the extra I/O bandwidth for anything else, and you
shouldn't need to spend a fortune, particularly if there are ways of
making the software more efficient, which after all will benefit
everyone.

Cheers

-- Ian