[ccp4bb] Processing compressed diffraction images?
All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian
Re: [ccp4bb] Processing compressed diffraction images?
d*TREK will process compressed images with the following extensions: .gz .bz2 .Z .pck and .cbf -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian Tickle Sent: Thursday, May 06, 2010 6:25 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Processing compressed diffraction images? All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. A... cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian
Re: [ccp4bb] Processing compressed diffraction images?
Entering xds gzip at www.ixquick.com came up with http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html: To save space it is allowed to compress the images by using the UNIX compress, gzip, or bzip2 routines. On data processing XDS will automatically recognize and expand the compressed images files. The file name extensions (.Z, .z, .gz, bz2) due to the compression routines should not be included in the generic file name template. I thought to remember that mosflm also supports gzipped images but didn't find a reference within 2 minutes. I'm surprised to hear that you get such a high compression rate with mccd images. Cheers, Tim On Thu, May 06, 2010 at 12:24:47PM +0100, Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: Digital signature
Re: [ccp4bb] Processing compressed diffraction images?
Jim, thanks for the info. At present we use d*TREK mostly only for in-house data (Saturn, Jupiter R-axis) so the data collection rate is much lower and in any case we would gain nothing by compressing them since the I/O is the same whether it's gzip reading in the images or d*TREK. Our problem is that our people bring back a large no of datasets ( 150) from each synchrotron trip, dump them all on the file server and then try to process them all at the same time! Cheers -- Ian On Thu, May 6, 2010 at 12:38 PM, Jim Pflugrath jim.pflugr...@rigaku.com wrote: d*TREK will process compressed images with the following extensions: .gz .bz2 .Z .pck and .cbf -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian Tickle Sent: Thursday, May 06, 2010 6:25 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Processing compressed diffraction images? All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. A... cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian
Re: [ccp4bb] Processing compressed diffraction images?
Hi Ian I've looked briefly at implementing gunzip in Mosflm in the past, but never really pursued it. It could probably be done when I have some free time, but who knows when that will be? gzip'ing one of my standard test sets gives around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the compression is important too, and is considerably slower than uncompressing (since with uncompressing you know where you are going and have the instructions, whereas with compressing you have to find it all out as you proceed). There are several ways of writing compressed images that (I believe) all the major processing packages have implemented - for example, Jan Pieter Abrahams has one which has been used for Mar images for a long time, and CBF has more than one. There are very good reasons for all detectors to write their images using CBFs with some kind of compression (I think that all new MX detectors at Diamond, for example, are required to be able to). Pilatus images are written using a fast compressor and read (in Mosflm and XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do the job every bit as well) using a fast decompressor - so this goes some way towards dealing with that particular problem - the image files aren't as big as you'd expect from their physical size and 20-bit dynamic range (from the 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good as you'd get from bzip2 anyway. I'd be somewhat surprised to see a non-lossy fast algorithm that could give you 10-fold compression with normal MX type images - the empty space between Bragg maxima is full of detail (noise, diffuse scatter). If you had a truly flat background you could get much better compression, of course. On 6 May 2010, at 11:24, Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH
Re: [ccp4bb] Processing compressed diffraction images?
Hi Tim thanks for that, sorry yes I missed that page. But I'm still not clear: is it uncompressing to disk or is it doing it in memory? I assume the latter: if the former then obviously nothing is gained. You're right about the compression factor, it's more like a factor of 2 or 3, I should have looked at the image in question as the one I picked had no spots! Cheers -- Iam On Thu, May 6, 2010 at 12:54 PM, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: Entering xds gzip at www.ixquick.com came up with http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html: To save space it is allowed to compress the images by using the UNIX compress, gzip, or bzip2 routines. On data processing XDS will automatically recognize and expand the compressed images files. The file name extensions (.Z, .z, .gz, bz2) due to the compression routines should not be included in the generic file name template. I thought to remember that mosflm also supports gzipped images but didn't find a reference within 2 minutes. I'm surprised to hear that you get such a high compression rate with mccd images. Cheers, Tim On Thu, May 06, 2010 at 12:24:47PM +0100, Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iD8DBQFL4q3xUxlJ7aRr7hoRAibGAKDJvFsy+GUZQ3E/tqQMVovkJxPTRACgoSjb QaVZzpgtXv4IUTx5Kt8d5eM= =OvRA -END PGP SIGNATURE-
Re: [ccp4bb] Processing compressed diffraction images?
Hi Harry Thanks for the info. Speed of compression is not an issue I think since compression backing up of the images are done asynchronously with data collection, and currently backing up easily keeps up, so I think compression straight to the backup disk would too. As you saw from my reply to Tim my compression factor of 10 was a bit optimistic, for images with spots on them (!) it's more like 2 or 3 with gzip, as you say. I found an old e-mail from James Holton where he suggested lossy compression for diffraction images (as long as it didn't change the F's significantly!) - I'm not sure whether anything came of that! Cheers -- Ian On Thu, May 6, 2010 at 2:04 PM, Harry Powell ha...@mrc-lmb.cam.ac.uk wrote: Hi Ian I've looked briefly at implementing gunzip in Mosflm in the past, but never really pursued it. It could probably be done when I have some free time, but who knows when that will be? gzip'ing one of my standard test sets gives around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the compression is important too, and is considerably slower than uncompressing (since with uncompressing you know where you are going and have the instructions, whereas with compressing you have to find it all out as you proceed). There are several ways of writing compressed images that (I believe) all the major processing packages have implemented - for example, Jan Pieter Abrahams has one which has been used for Mar images for a long time, and CBF has more than one. There are very good reasons for all detectors to write their images using CBFs with some kind of compression (I think that all new MX detectors at Diamond, for example, are required to be able to). Pilatus images are written using a fast compressor and read (in Mosflm and XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do the job every bit as well) using a fast decompressor - so this goes some way towards dealing with that particular problem - the image files aren't as big as you'd expect from their physical size and 20-bit dynamic range (from the 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good as you'd get from bzip2 anyway. I'd be somewhat surprised to see a non-lossy fast algorithm that could give you 10-fold compression with normal MX type images - the empty space between Bragg maxima is full of detail (noise, diffuse scatter). If you had a truly flat background you could get much better compression, of course. On 6 May 2010, at 11:24, Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now
Re: [ccp4bb] Processing compressed diffraction images?
Compression methods such as gzip are unlikely to be optimum for diffraction images, and AFAIK the methods in CBF are better (I think Jim Pflugrath did some races a long time ago, and I guess others have too). There is no reason for data acquisition software ever to write uncompressed images (let alone having 57 different ways of doing it) Phil On 6 May 2010, at 13:38, Ian Tickle wrote: Hi Harry Thanks for the info. Speed of compression is not an issue I think since compression backing up of the images are done asynchronously with data collection, and currently backing up easily keeps up, so I think compression straight to the backup disk would too. As you saw from my reply to Tim my compression factor of 10 was a bit optimistic, for images with spots on them (!) it's more like 2 or 3 with gzip, as you say. I found an old e-mail from James Holton where he suggested lossy compression for diffraction images (as long as it didn't change the F's significantly!) - I'm not sure whether anything came of that! Cheers -- Ian On Thu, May 6, 2010 at 2:04 PM, Harry Powell ha...@mrc-lmb.cam.ac.uk wrote: Hi Ian I've looked briefly at implementing gunzip in Mosflm in the past, but never really pursued it. It could probably be done when I have some free time, but who knows when that will be? gzip'ing one of my standard test sets gives around a 40-50% reduction in size, bzip2 ~60-70%. The speed of doing the compression is important too, and is considerably slower than uncompressing (since with uncompressing you know where you are going and have the instructions, whereas with compressing you have to find it all out as you proceed). There are several ways of writing compressed images that (I believe) all the major processing packages have implemented - for example, Jan Pieter Abrahams has one which has been used for Mar images for a long time, and CBF has more than one. There are very good reasons for all detectors to write their images using CBFs with some kind of compression (I think that all new MX detectors at Diamond, for example, are required to be able to). Pilatus images are written using a fast compressor and read (in Mosflm and XDS, anyway - I have no idea about d*Trek or HKL, but imagine they would do the job every bit as well) using a fast decompressor - so this goes some way towards dealing with that particular problem - the image files aren't as big as you'd expect from their physical size and 20-bit dynamic range (from the 6M they're roughly 6MB, rather than 6MB * 2.5). So that seems about as good as you'd get from bzip2 anyway. I'd be somewhat surprised to see a non-lossy fast algorithm that could give you 10-fold compression with normal MX type images - the empty space between Bragg maxima is full of detail (noise, diffuse scatter). If you had a truly flat background you could get much better compression, of course. On 6 May 2010, at 11:24, Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring
Re: [ccp4bb] Processing compressed diffraction images?
The results from compressing a diffraction image must vary quite a bit on a case by case basis. I looked into it a long time ago using images from a few datasets from 2 different projects. Compress was quite faster than gzip or bzip2 in these tests. It also delivered the less compression. gzip and bzip2 were about the same speed (or lack thereof). But while the difference in speed was marginal bzip2 delivered a 20-30% size improvement over gzip. The tests were with images of diffracting crystals, the diffraction extending to the edge of the detector. Regards, Thierry -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Ian Tickle Sent: Thursday, May 06, 2010 08:28 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Processing compressed diffraction images? Hi Tim thanks for that, sorry yes I missed that page. But I'm still not clear: is it uncompressing to disk or is it doing it in memory? I assume the latter: if the former then obviously nothing is gained. You're right about the compression factor, it's more like a factor of 2 or 3, I should have looked at the image in question as the one I picked had no spots! Cheers -- Iam On Thu, May 6, 2010 at 12:54 PM, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: Entering xds gzip at www.ixquick.com came up with http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xds_parameters.html: To save space it is allowed to compress the images by using the UNIX compress, gzip, or bzip2 routines. On data processing XDS will automatically recognize and expand the compressed images files. The file name extensions (.Z, .z, .gz, bz2) due to the compression routines should not be included in the generic file name template. I thought to remember that mosflm also supports gzipped images but didn't find a reference within 2 minutes. I'm surprised to hear that you get such a high compression rate with mccd images. Cheers, Tim On Thu, May 06, 2010 at 12:24:47PM +0100, Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version
[ccp4bb] Processing compressed diffraction images?
yet another compressor to consider : xz http://tukaani.org/xz/ hth
Re: [ccp4bb] Processing compressed diffraction images?
Something I have been playing with recently that might address your problem in a way you like is SquashFS: http://squashfs.sourceforge.net/ SquashFS is a read-only compressed file system. It uses gzip --best, which is comparable to bzip2 for diffraction images (in my experience). Basically, it works a lot like burning to a CD. You run mksquashfs to create the compressed image and then mount -o loop it. Then voila! You can access everything in the archive as if it were an uncompressed file. Disk I/O then consists of compressed data (decompression is done by the kernel), and so does network traffic if you play a clever trick: share the compressed file over NFS and mount -o loop it locally. This has much bigger advantages than you might realize because most of the NFS traffic that brings a file server to its knees are the tiny little writes that are done to update access times. NFS writes (and RAID writes) are all really expensive, and you can actually gain a considerable performance increase by just mounting your data disks read-only (or by putting noatime as a mount option). Anyway, SquashFS is not as slick as the transparent compression you can get with HFS or NTFS, but I personally like the fact that it is read-only (good for data). For real-time backup, mksquashfs does support appending to an existing archive, so you can probably build your squashfs file on the usb disk at the beamline (even if the beamline computer kernels can't mount it). However, if you MUST have your processing files mixed amongst your images, you can use unionfs to overlay a writable file system with the read-only one. Depends on how cooperative your IT guys are... -James Holton MAD Scientist Ian Tickle wrote: All - No doubt this topic has come up before on the BB: I'd like to ask about the current capabilities of the various integration programs (in practice we use only MOSFLM XDS) for reading compressed diffraction images from synchrotrons. AFAICS XDS has limited support for reading compressed images (TIFF format from the MARCCD detector and CCP4 compressed format from the Oxford Diffraction CCD); MOSFLM doesn't seem to support reading compressed images at all (I'm sure Harry will correct me if I'm wrong about this!). I'm really thinking about gzipped files here: bzip2 no doubt gives marginally smaller files but is very slow. Currently we bring back uncompressed images but it seems to me that this is not the most efficient way of doing things - or is it just that my expectation that it's more efficient to read compressed images and uncompress in memory not realised in practice? For example the AstexViewer molecular viewer software currently reads gzipped CCP4 maps directly and gunzips them in memory; this improves the response time by a modest factor of ~ 1.5, but this is because electron density maps are 'dense' from a compression point of view; X-ray diffraction images tend to have much more 'empty space' and the compression factor is usually considerably higher (as much as 10-fold). On a recent trip we collected more data than we anticipated the uncompressed data no longer fitted on our USB disk (the data is backed up to the USB disk as it's collected), so we would have definitely benefited from compression! However file size is *not* the issue: disk space is cheap after all. My point is that compressed images surely require much less disk I/O to read. In this respect bringing back compressed images and then uncompressing back to a local disk completely defeats the object of compression - you actually more than double the I/O instead of reducing it! We see this when we try to process the ~150 datasets that we bring back on our PC cluster and the disk I/O completely cripples the disk server machine (and everyone who's trying to use it at the same time!) unless we're careful to limit the number of simultaneous jobs. When we routinely start to use the Pilatus detector on the beamlines this is going to be even more of an issue. Basically we have plenty of processing power from the cluster: the disk I/O is the bottleneck. Now you could argue that we should spread the load over more disks or maybe spend more on faster disk controllers, but the whole point about disks is they're cheap, we don't need the extra I/O bandwidth for anything else, and you shouldn't need to spend a fortune, particularly if there are ways of making the software more efficient, which after all will benefit everyone. Cheers -- Ian