Re: [ccp4bb] image compression
HI James, Regarding the suggestion of lossy compression, it is really hard to comment without having a good idea of the real cost of doing this. So, I have a suggestion: - grab a bag of JCSG data sets, which we know should all be essentially OK. - you squash then unsquash them with your macguffin, perhaps randomizing them as to whether A or B is squashed. - process them with Elves / xia2 / autoPROC (something which is reproducible) - pop the results into pdb_redo Then compare the what-comes-out. Ultimately adding noise may (or may not) make a measurable difference to the final refinement - this may be a way of telling if it does or doesn't. Why however would I have any reason to worry? Because the noise being added is not really random - it will compression artifacts. This could have a subtle effect on how the errors are estimated and so on. However you can hum and haw about this for a decade without reaching a conclusion. Here, it's something which in all honesty we can actually evaluate, so is it worth giving it a go? If the results were / are persuasive (i.e. a report on the use of lossy compression in transmission and storage of X-ray diffraction data was actually read and endorsed by the community) this would make it much more worthwhile for consideration for inclusion in e.g. cbflib. I would however always encourage (if possible) that the original raw data is kept somewhere on disk in an unmodified form - I am not a fan of one-way computational processes with unique data. Thoughts anyone? Cheerio, Graeme On 7 November 2011 17:30, James Holton jmhol...@lbl.gov wrote: At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a website where you could download the original diffraction images corresponding to any given PDB file, including early datasets that were from the same project, but because of smeary spots or whatever, couldn't be solved. There might even be datasets with unknown PDB IDs because that particular project never did work out, or because the relevant protein sequence has been lost. Remember, few of these datasets will be less than 5 years old if we try to allow enough time for the original data collector to either solve it or graduate (and then cease to care). Even for the final dataset, there will be a delay, since the half-life between data collection and coordinate deposition in the PDB is still ~20 months. Plenty of time to forget. So, although the images were archived (probably named test and in a directory called john) it may be that the only way to figure out which PDB ID is the right answer is by processing them and comparing to all deposited Fs. Assume this was done. But there will always be some datasets that don't match any PDB. Are those interesting? What about ones that can't be processed? What about ones that can't even be indexed? There may be a lot of those! (hypothetically, of course). Anyway, assume that someone did go through all the trouble to make these datasets available for download, just in case they are interesting, and annotated them as much as possible. There will be about 20 datasets for any given PDB ID. Now assume that for each of these datasets this hypothetical website has two links, one for the raw data, which will average ~2 GB per wedge (after gzip compression, taking at least ~45 min to download), and a second link for a lossy compressed version, which is only ~100 MB/wedge (2 min download). When decompressed, the images will visually look pretty much like the originals, and generally give you very similar Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other statistics when processed with contemporary software. Perhaps a bit worse. Essentially, lossy compression is equivalent to adding noise to the images. Which one would you try first? Does lossy compression make it easier to hunt for interesting datasets? Or is it just too repugnant to have modified the data in any way shape or form ... after the detector manufacturer's software has corrected it? Would it suffice to simply supply a couple of example images for download instead? -James Holton MAD Scientist
Re: [ccp4bb] weight matrix and R-FreeR gap optimization
Hi James, That is not exactly a lot of info to decide the best weight. The optimal weight is (very loosely) resolution dependent. At normal resolutions the optimal matrix weight is usually well below 1.0. Start at 0.3 and try a few weights to see what works best for your data. To close the R-free gap you can also try to optimize other refinement parameters such as NCS restraints, B-factor model (and restraint weight). Jelly body restraints sometimes work really well to keep the R-free gap sensible, especially at low resolution. Cheers, Robbie From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of james09 pruza Sent: Tuesday, November 08, 2011 06:40 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] weight matrix and R-FreeR gap optimization Dear ccp4bbers, I wonder if someone can help me defining proper weight matrix term in Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of 1.98 with a gap of 7. Thanks for suggestions in advance. James
Re: [ccp4bb] image compression
Hi James, I see no real need for lossy compression datasets. They may be useful for demonstration purposes, and to follow synchrotron data collection remotely. But for processing I need the real data. It is my experience that structure solution, at least in the difficult cases, depends on squeezing out every bit of scattering information from the data, as much as is possible with the given software. Using a lossy-compression dataset in this situation would give me the feeling if structure solution does not work out, I'll have to re-do everything with the original data - and that would be double work. Better not start going down that route. The CBF byte compression puts even a 20bit detector pixel into a single byte, on average. These frames can be further compressed, in the case of Pilatus fine-slicing frames, using bzip2, almost down to the level of entropy in the data (since there are so many zero pixels). And that would be lossless. Storing lossily-compressed datasets would of course not double the diskspace needed, but would significantly raise the administrative burdens. Just to point out my standpoint in this whole discussion about storage of raw data: I've been storing our synchrotron datasets on disks, since 1999. The amount of money we spend per year for this purpose is constant (less than 1000€). This is possible because the price of a GB disk space drops faster than the amount of data per synchrotron trip rises. So if the current storage is full (about every 3 years), we set up a bigger RAID (plus a backup RAID); the old data, after copying over, always consumes only a fraction of the space on the new RAID. So I think the storage cost is actually not the real issue - rather, the real issue has a strong psychological component. People a) may not realize that the software they use is constantly being improved, and that needs data which cover all the corner cases; b) often do not wish to give away something because they feel it might help their competitors, or expose their faults. best, Kay (XDS co-developer) Original Message Date: Mon, 7 Nov 2011 09:30:11 -0800 From: James Holton jmhol...@lbl.gov Subject: image compression MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit At the risk of sounding like another poll, I have a pragmatic question for the methods development community: Hypothetically, assume that there was a website where you could download the original diffraction images corresponding to any given PDB file, including early datasets that were from the same project, but because of smeary spots or whatever, couldn't be solved. There might even be datasets with unknown PDB IDs because that particular project never did work out, or because the relevant protein sequence has been lost. Remember, few of these datasets will be less than 5 years old if we try to allow enough time for the original data collector to either solve it or graduate (and then cease to care). Even for the final dataset, there will be a delay, since the half-life between data collection and coordinate deposition in the PDB is still ~20 months. Plenty of time to forget. So, although the images were archived (probably named test and in a directory called john) it may be that the only way to figure out which PDB ID is the right answer is by processing them and comparing to all deposited Fs. Assume this was done. But there will always be some datasets that don't match any PDB. Are those interesting? What about ones that can't be processed? What about ones that can't even be indexed? There may be a lot of those! (hypothetically, of course). Anyway, assume that someone did go through all the trouble to make these datasets available for download, just in case they are interesting, and annotated them as much as possible. There will be about 20 datasets for any given PDB ID. Now assume that for each of these datasets this hypothetical website has two links, one for the raw data, which will average ~2 GB per wedge (after gzip compression, taking at least ~45 min to download), and a second link for a lossy compressed version, which is only ~100 MB/wedge (2 min download). When decompressed, the images will visually look pretty much like the originals, and generally give you very similar Rmerge, Rcryst, Rfree, I/sigma, anomalous differences, and all other statistics when processed with contemporary software. Perhaps a bit worse. Essentially, lossy compression is equivalent to adding noise to the images. Which one would you try first? Does lossy compression make it easier to hunt for interesting datasets? Or is it just too repugnant to have modified the data in any way shape or form ... after the detector manufacturer's software has corrected it? Would it suffice to simply supply a couple of example images for download instead? -James Holton MAD Scientist smime.p7s Description: S/MIME
Re: [ccp4bb] image compression
Hi I am not a fan of one-way computational processes with unique data. Thoughts anyone? Cheerio, Graeme I agree. Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011
Re: [ccp4bb] image compression
Le 08/11/11 10:15, Kay Diederichs a écrit : Hi James, I see no real need for lossy compression datasets. They may be useful for demonstration purposes, and to follow synchrotron data collection remotely. But for processing I need the real data. It is my experience that structure solution, at least in the difficult cases, depends on squeezing out every bit of scattering information from the data, as much as is possible with the given software. Using a lossy-compression dataset in this situation would give me the feeling if structure solution does not work out, I'll have to re-do everything with the original data - and that would be double work. Better not start going down that route. The CBF byte compression puts even a 20bit detector pixel into a single byte, on average. These frames can be further compressed, in the case of Pilatus fine-slicing frames, using bzip2, almost down to the level of entropy in the data (since there are so many zero pixels). And that would be lossless. Storing lossily-compressed datasets would of course not double the diskspace needed, but would significantly raise the administrative burdens. Just to point out my standpoint in this whole discussion about storage of raw data: I've been storing our synchrotron datasets on disks, since 1999. The amount of money we spend per year for this purpose is constant (less than 1000€). This is possible because the price of a GB disk space drops faster than the amount of data per synchrotron trip rises. So if the current storage is full (about every 3 years), we set up a bigger RAID (plus a backup RAID); the old data, after copying over, always consumes only a fraction of the space on the new RAID. So I think the storage cost is actually not the real issue - rather, the real issue has a strong psychological component. People a) may not realize that the software they use is constantly being improved, and that needs data which cover all the corner cases; b) often do not wish to give away something because they feel it might help their competitors, or expose their faults. best, Kay (XDS co-developer) Hi Kay and others, I completely agree with you. Datalove, 3 :-) -- Miguel Architecture et Fonction des Macromolécules Biologiques (UMR6098) CNRS, Universités d'Aix-Marseille I II Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France Tel: +33(0) 491 82 55 93 Fax: +33(0) 491 26 67 20 mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia
Re: [ccp4bb] image compression
Um, but isn't Crystallograpy based on a series of one-way computational processes: photons - images images - {struture factors, symmetry} {structure factors, symmetry, chemistry} - solution {structure factors, symmetry, chemistry, solution} - refined solution At each stage we tolerate a certain amount of noise in going backwards. Certainly it is desirable to have the original data to be able to go forwards, but until the arrival of pixel array detectors, we were very far from having the true original data, and even pixel array detectors don't capture every single photon. I am not recommending lossy compressed images as a perfect replacement for lossless compressed images, any more than I would recommend structure factors are a replacement for images. It would be nice if we all had large budgets, huge storage capacity and high network speeds and if somebody would repeal the speed of light and other physical constraints, so that engineering compromises were never necessary, but as James has noted, accepting such engineering compromises has been of great value to our colleagues who work with the massive image streams of the entertainment industry. Without lossy compression, we would not have the _higher_ image quality we now enjoy in the less-than-perfectly-faithful HDTV world that has replaced the highly faithful, but lower capacity, NTSC/PAL world. Please, in this, let us not allow the perfect to be the enemy of the good. James is proposing something good. Regards, Herbert = Herbert J. Bernstein Professor of Mathematics and Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Tue, 8 Nov 2011, Harry Powell wrote: Hi I am not a fan of one-way computational processes with unique data. Thoughts anyone? Cheerio, Graeme I agree. Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011
Re: [ccp4bb] Installation of CCP4 under Windows 7
Hi, I have encountered this problem with CCP4 6.2.0 as well as for the past few years. Whenever I run the CCP4 installer for all users on my Windows Vista PC as administrator it only creates desktop icons and startup menu item for the administrator account, not for other users. It appears that the CCP4 desktop icon shortcut is put in C:\Users\Administrator\Desktop and the start menu CCP4 folder is put in C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\CCP4-Packages-6.2.0. Both of these paths are incorrect as they are invisible when logging in as another user. I'm not sure where the CCP4 desktop icon shortcut should go but placing the startup folder in C:\ProgramData\Microsoft\Windows\Start Menu\Programs makes it accessible to other users. On a separate note, it also appears that there is a bug in the ActiveTCL installer which is recommended to be installed on Vista platforms. It only makes file associations with tcl files for the user who installed this program. Consequently when double clicking the CCP4 icon as a different user than administrator windows prompts the user asking what program the file should be opened with. Again this can be overcome by making the wish.exe program from the ActiveTcl folder the default program to use. Regards, Robert Oeffner, Ph.D. Research Associate, Read group Department of Haematology, University of Cambridge Cambridge Institute of Medical Research Wellcome Trust / MRC Building, Hills Road, Cambridge, CB2 0XY www-structmed.cimr.cam.ac.uk, tel:01223763234
Re: [ccp4bb] image compression
Dear Herbert, Sorry, the point I was getting at was that the process is one way, but if it is also *destructive* i.e. the original master is not available then I would not be happy. If the master copy of what was actually recorded is available from a tape someplace perhaps not all that quickly then to my mind that's fine. When we go from images to intensities, the images still exist. And by and large the intensities are useful enough that you don't go back to the images again. This is worth investigating I believe, which is why I made that proposal. Mostly I listen to mp3's as they're convenient, but I still buy CD's not direct off e.g. itunes, and yes a H264 compressed video stream is much nicer to watch than VHS. Best wishes, Graeme On 8 November 2011 12:17, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: Um, but isn't Crystallograpy based on a series of one-way computational processes: photons - images images - {struture factors, symmetry} {structure factors, symmetry, chemistry} - solution {structure factors, symmetry, chemistry, solution} - refined solution At each stage we tolerate a certain amount of noise in going backwards. Certainly it is desirable to have the original data to be able to go forwards, but until the arrival of pixel array detectors, we were very far from having the true original data, and even pixel array detectors don't capture every single photon. I am not recommending lossy compressed images as a perfect replacement for lossless compressed images, any more than I would recommend structure factors are a replacement for images. It would be nice if we all had large budgets, huge storage capacity and high network speeds and if somebody would repeal the speed of light and other physical constraints, so that engineering compromises were never necessary, but as James has noted, accepting such engineering compromises has been of great value to our colleagues who work with the massive image streams of the entertainment industry. Without lossy compression, we would not have the _higher_ image quality we now enjoy in the less-than-perfectly-faithful HDTV world that has replaced the highly faithful, but lower capacity, NTSC/PAL world. Please, in this, let us not allow the perfect to be the enemy of the good. James is proposing something good. Regards, Herbert = Herbert J. Bernstein Professor of Mathematics and Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Tue, 8 Nov 2011, Harry Powell wrote: Hi I am not a fan of one-way computational processes with unique data. Thoughts anyone? Cheerio, Graeme I agree. Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011
[ccp4bb] 4th Winter School on soft X-rays in Macromolecular Crystallography
4th Winter School on soft X-rays in Macromolecular Crystallography We are pleased to announce that the 4th Winter school on soft X-rays in Macromolecular Crystallography will take place at the European Synchrotron Radiation Facility, Grenoble, France 6th 8th February 2012. An increasing number of new crystal structures of biological macromolecules structures are solved exploiting anomalous signals available at longer wavelengths. The main advantage of using longer wavelengths is that the anomalous signal from innate sulphur, phosphorous or other light atoms (e.g. Ca, K, Cl) can be used to obtain phasing information, thus obviating the need to prepare the derivative crystals usually used in macromolecular crystal structure solution. However, at longer wavelengths currently routinely accessible, such anomalous signals are rather small (usually ~ 1%). Special care must thus be given to experimental setups and data collection protocols. The Winter School brings together experts in the hardware and software required to successfully perform such experiments. The following topics will be covered: Optimised experimental setups for diffraction data collection at longer wavelengths Advanced data collection strategies, taking into account radiation damage Optimised protocols of data processing for longer wavelength experiments Anomalous scattering substructure determination using longer wavelength X-rays The Winter School will being held under the auspices of the ESRF Users Meeting and the number of participants limited to 20. Confirmed speakers at the workshop (see http://www.esrf.fr/events/conferences/users-meeting-2012-workshops/mx-school ) for provisional program) include: M. Cianci, EMBL Hamburg, Germany K. Djinovic-Carugo, Vienna Biocenter, Austria D. de Sanctis, ESRF, Grenoble, France P. Johansson, AstraZeneca Structural Chemistry Laboratory, Molndal, Sweden J. Liu, National Laboratory of Biomacromolecules, Beijing, China R. Giordano, ESRF, Grenoble, France G. Leonard, ESRF, Grenoble, France C. Mueller-Dieckmann, ESRF Grenoble J. Pflugrath, Rigaku, USA A. Popov, ESRF, Grenoble, France I. Uson, IBMB-CSIC, Barcelona, Spain A. Wagner, Diamond Light Source, UK B.C. Wang, University of Georgia, USA M. Wang, Swiss Light Source Villigen, Switzerland M.S. Weiss, Helmholtz-Zentrum für Materialien und Energie, Germany The Winter School registration fee of 150 includes all meals, 4 nights accommodation in the ESRF Guesthouse as well as registration for the ESRF Users' Meeting ( http://www.esrf.fr/events/conferences/users-meeting-2012-workshops/preliminary-programme ). Applications for participation in the Winter School should include a C.V., a letter of motivation, a poster abstract and (where appropriate) a recommendation letter from a PhD supervisor and be sent by Monday 12th December to the Winter School organisers at mx-winterschool2...@esrf.fr Further information concerning the workshop and application procedures can be found at http://www.esrf.fr/events/conferences/users-meeting-2012-workshops/mx-school On behalf of the organizers, Gordon Leonard
Re: [ccp4bb] Installation of CCP4 under Windows 7
Hi, Thanks for listing all the problems. We are testing a new installer for Windows now. It's written from scratch (we switched from InstallShield to WiX) and if we don't find any issues with it we'll put it on ftp tomorrow. ActiveTcl should not be necessary with this and future versions. In September we updated our build of Tcl and friends and it seems to work well with ccp4i and imosflm. We'll remove the recommendation of ActiveTcl from the website soon. Cheers Marcin -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Robert Oeffner Sent: 08 November 2011 12:27 To: ccp4bb Subject: Re: [ccp4bb] Installation of CCP4 under Windows 7 Hi, I have encountered this problem with CCP4 6.2.0 as well as for the past few years. Whenever I run the CCP4 installer for all users on my Windows Vista PC as administrator it only creates desktop icons and startup menu item for the administrator account, not for other users. It appears that the CCP4 desktop icon shortcut is put in C:\Users\Administrator\Desktop and the start menu CCP4 folder is put in C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\CCP4-Packages-6.2.0. Both of these paths are incorrect as they are invisible when logging in as another user. I'm not sure where the CCP4 desktop icon shortcut should go but placing the startup folder in C:\ProgramData\Microsoft\Windows\Start Menu\Programs makes it accessible to other users. On a separate note, it also appears that there is a bug in the ActiveTCL installer which is recommended to be installed on Vista platforms. It only makes file associations with tcl files for the user who installed this program. Consequently when double clicking the CCP4 icon as a different user than administrator windows prompts the user asking what program the file should be opened with. Again this can be overcome by making the wish.exe program from the ActiveTcl folder the default program to use. Regards, Robert Oeffner, Ph.D. Research Associate, Read group Department of Haematology, University of Cambridge Cambridge Institute of Medical Research Wellcome Trust / MRC Building, Hills Road, Cambridge, CB2 0XY www-structmed.cimr.cam.ac.uk, tel:01223763234
Re: [ccp4bb] weight matrix and R-FreeR gap optimization
On 11/08/2011 05:39 AM, james09 pruza wrote: Dear ccp4bbers, I wonder if someone can help me defining proper weight matrix term in Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of 1.98 with a gap of 7. Thanks for suggestions in advance. James What is your resolution? The gap is usually wider at lower resolution. Eleanor
Re: [ccp4bb] weight matrix and R-FreeR gap optimization
What is your resolution? The gap is usually wider at lower resolution. Here a figure displaying distribution gap stats: http://www.ruppweb.org/garland/gallery/Ch12/pages/Biomolecular_Crystallograp hy_Fig_12-24.htm Cheers, BR -Original Message- From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Eleanor Dodson Sent: Tuesday, November 08, 2011 8:11 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] weight matrix and R-FreeR gap optimization On 11/08/2011 05:39 AM, james09 pruza wrote: Dear ccp4bbers, I wonder if someone can help me defining proper weight matrix term in Refmac5 to lower the R-FreeR gap. The log file indicates weight matrix of 1.98 with a gap of 7. Thanks for suggestions in advance. James What is your resolution? The gap is usually wider at lower resolution. Eleanor
[ccp4bb] phaser openmp
Could anyone point me towards instructions on how to get/build parallelized phaser binary on linux? I searched around but so far found nothing. The latest updated phaser binary doesn't seem to be parallelized. Apologies if this has been resolved before - just point at the relevant thread, please. -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] phaser openmp
Hi Ed, in the CCP4 distribution, openmp is not enabled by default, and there seems to be no easy way to enable it (i.e. by setting a flag at the configure stage). On the other hand, you can easily create a separate build for phaser that is openmp enabled and use phaser from there. To do this, create a new folder, say phaser-build, cd into it, and issue the following commands (this assumes you are using bash): $ python $CCP4/lib/cctbx/cctbx_sources/cctbx_project/libtbx/configure.py --repository=$CCP4/src/phaser/source phaser --build-boost-python-extensions=False --enable-openmp-if-possible=True $ . ./setpaths.sh (source ./setpaths.csh with csh) $ libtbx.scons (if you have several CPUs, add -jX where X is the number of CPUs you want to use for compilation) This will build phaser that is openmp-enabled. You can also try passing the --static-exe flag (to configure.py), in which case the executable is static and can be relocated without any headaches. This works with certain compilers. Let me know if there are any problems! BW, Gabor On Nov 8 2011, Ed Pozharski wrote: Could anyone point me towards instructions on how to get/build parallelized phaser binary on linux? I searched around but so far found nothing. The latest updated phaser binary doesn't seem to be parallelized. Apologies if this has been resolved before - just point at the relevant thread, please.
Re: [ccp4bb] image compression
At the risk of putting this thread back on-topic, my original question was not should I just lossfully compress my images and throw away the originals. My question was: would you download the compressed images first? So far, noone has really answered it. I think it is obvious that of course we would RATHER have the original data, but if access to the original data is slow (by a factor of 30 at best) then can the mp3 version of diffraction data play a useful role in YOUR work? Taking Graeme's request from a different thread as an example, he would like to see stuff in P21 with a 90 degree beta angle. There are currently ~609 examples of this in the PDB. So, I ask again: which one would you download first?. 1aip? (It is first alphabetically). Then again, if you just email the corresponding authors of all 609 papers, the response rate alone might whittle the number of datasets to deal with down to less than 10. Perhaps even less than 1. -James Holton MAD Scientist On 11/8/2011 5:17 AM, Graeme Winter wrote: Dear Herbert, Sorry, the point I was getting at was that the process is one way, but if it is also *destructive* i.e. the original master is not available then I would not be happy. If the master copy of what was actually recorded is available from a tape someplace perhaps not all that quickly then to my mind that's fine. When we go from images to intensities, the images still exist. And by and large the intensities are useful enough that you don't go back to the images again. This is worth investigating I believe, which is why I made that proposal. Mostly I listen to mp3's as they're convenient, but I still buy CD's not direct off e.g. itunes, and yes a H264 compressed video stream is much nicer to watch than VHS. Best wishes, Graeme On 8 November 2011 12:17, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: Um, but isn't Crystallograpy based on a series of one-way computational processes: photons - images images - {struture factors, symmetry} {structure factors, symmetry, chemistry} - solution {structure factors, symmetry, chemistry, solution} - refined solution At each stage we tolerate a certain amount of noise in going backwards. Certainly it is desirable to have the original data to be able to go forwards, but until the arrival of pixel array detectors, we were very far from having the true original data, and even pixel array detectors don't capture every single photon. I am not recommending lossy compressed images as a perfect replacement for lossless compressed images, any more than I would recommend structure factors are a replacement for images. It would be nice if we all had large budgets, huge storage capacity and high network speeds and if somebody would repeal the speed of light and other physical constraints, so that engineering compromises were never necessary, but as James has noted, accepting such engineering compromises has been of great value to our colleagues who work with the massive image streams of the entertainment industry. Without lossy compression, we would not have the _higher_ image quality we now enjoy in the less-than-perfectly-faithful HDTV world that has replaced the highly faithful, but lower capacity, NTSC/PAL world. Please, in this, let us not allow the perfect to be the enemy of the good. James is proposing something good. Regards, Herbert = Herbert J. Bernstein Professor of Mathematics and Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Tue, 8 Nov 2011, Harry Powell wrote: Hi I am not a fan of one-way computational processes with unique data. Thoughts anyone? Cheerio, Graeme I agree. Harry -- Dr Harry Powell, MRC Laboratory of Molecular Biology, MRC Centre, Hills Road, Cambridge, CB2 0QH http://www.iucr.org/resources/commissions/crystallographic-computing/schools/mieres2011
Re: [ccp4bb] Archiving Images for PDB Depositions
All, We have been following the CCP4BB discussion with interest. As has been mentioned on several occasions, the JCSG has maintained, for several years now, an open archive of all diffraction datasets associated with our deposited structures. Overall this has been a highly positive experience and many developers, researchers, teachers and students have benefited from our archive. We currently have close to 100 registered users of our archive and we seem to receive a new batch of users each time our archive is acknowledged in a paper or is mentioned at a conference. Building on this initial success, we are currently extending our archive to include unsolved datasets, which will help us more readily share data and collaborate with methods developers on some of our less tractable datasets. We are also planning to include screening images for all crystals evaluated as part of the JCSG pipeline (largely as a feedback tool to help improve crystal quality). At JCSG, we benefit tremendously from our central database, which already tracks all required metadata associated with any crystal. Thus I agree with other comments that the cost of such an undertaking should not be underestimated. The cost of the hardware may be modest; however, people and resources are needed to develop and maintain a robust and reliable archive. To date we have not assigned DOIs to our datasets, but we certainly feel this would be of value going forward and are currently considering this option for our revised archive, which is currently in development. If successful then this may form a good prototype system, which could be opened up to a broader community outside of JCSG. We (JCSG) have already shared much of our experiences with the IUCR working group and we would be happy to participate and contribute to any ongoing efforts. Sincerely, Ashley.Deacon JCSG Structure Determination Core Leader
Re: [ccp4bb] image compression
Le 08/11/2011 19:19, James Holton a écrit : At the risk of putting this thread back on-topic, my original question was not should I just lossfully compress my images and throw away the originals. My question was: would you download the compressed images first? So far, noone has really answered it. I think it is obvious that of course we would RATHER have the original data, but if access to the original data is slow (by a factor of 30 at best) then can the mp3 version of diffraction data play a useful role in YOUR work? Taking Graeme's request from a different thread as an example, he would like to see stuff in P21 with a 90 degree beta angle. There are currently ~609 examples of this in the PDB. So, I ask again: which one would you download first?. 1aip? (It is first alphabetically). Then again, if you just email the corresponding authors of all 609 papers, the response rate alone might whittle the number of datasets to deal with down to less than 10. Perhaps even less than 1. -James Holton MAD Scientist Hmm, I thought I had been clear. I will try to be more direct: Given the option, I would *only* download the original, non-lossy-compressed data. At the expense of time, yes. I don't think Graeme's example is very representative of our work, sorry. As long as the option between the two is warranted, I don't care. I just don't see the point for the very same reasons Kay has very clearly exposed. Best regards, -- Miguel Architecture et Fonction des Macromolécules Biologiques (UMR6098) CNRS, Universités d'Aix-Marseille I II Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France Tel: +33(0) 491 82 55 93 Fax: +33(0) 491 26 67 20 mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia
Re: [ccp4bb] image compression
Hmmm, so you would, when collecting large data images, say 4 images, 100MB in size, per second, in the middle of the night, from home, reject seeing compressed images on your data collection software, while the real thing is lingering behind somewhere, to be downloaded and stored later? As opposed to not seeing the images (because your home internet access cannot keep up) and only inspecting 1 in a 100 images to see progress? I think there are instances where compressed (lossy or not) images will be invaluable. I know the above situation was not the context, but (y'all may gasp about this) I still have some friends (in the US) who live so far out in the wilderness that only dial-up internet is available. That while synchrotrons and the detectors used get better all the time, which means more MB/s produced. James has already said (and I agree) that the original images (with all information) should not necessarily be thrown away. Perhaps a better question would be which would you use for what purpose, since I am convinced that compressed images are useful. I would want to process the real thing, unless I have been shown by scientific evidence that the compressed thing works equally well. It seems reasonable to assume that such evidence can be acquired and/or that we can be shown by evidence what we gain and lose by lossy-compressed images. Key might be to be able to choose the best thing for your particular application/case/location etc. So yes, James, of course this is useful and not a waste of time. Mark -Original Message- From: Miguel Ortiz Lombardia miguel.ortiz-lombar...@afmb.univ-mrs.fr To: CCP4BB CCP4BB@JISCMAIL.AC.UK Sent: Tue, Nov 8, 2011 12:29 pm Subject: Re: [ccp4bb] image compression Le 08/11/2011 19:19, James Holton a écrit : At the risk of putting this thread back on-topic, my original question was not should I just lossfully compress my images and throw away the originals. My question was: would you download the compressed images first? So far, noone has really answered it. I think it is obvious that of course we would RATHER have the original data, but if access to the original data is slow (by a factor of 30 at best) then can the mp3 version of diffraction data play a useful role in YOUR work? Taking Graeme's request from a different thread as an example, he would like to see stuff in P21 with a 90 degree beta angle. There are currently ~609 examples of this in the PDB. So, I ask again: which one would you download first?. 1aip? (It is first alphabetically). Then again, if you just email the corresponding authors of all 609 papers, the response rate alone might whittle the number of datasets to deal with down to less than 10. Perhaps even less than 1. -James Holton MAD Scientist Hmm, I thought I had been clear. I will try to be more direct: Given the option, I would *only* download the original, non-lossy-compressed data. At the expense of time, yes. I don't think Graeme's example is very representative of our work, sorry. As long as the option between the two is warranted, I don't care. I just don't see the point for the very same reasons Kay has very clearly exposed. Best regards, -- Miguel Architecture et Fonction des Macromolécules Biologiques (UMR6098) CNRS, Universités d'Aix-Marseille I II Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France Tel: +33(0) 491 82 55 93 Fax: +33(0) 491 26 67 20 mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia
Re: [ccp4bb] image compression
Le 08/11/2011 20:46, mjvdwo...@netscape.net a écrit : Hmmm, so you would, when collecting large data images, say 4 images, 100MB in size, per second, in the middle of the night, from home, reject seeing compressed images on your data collection software, while the real thing is lingering behind somewhere, to be downloaded and stored later? As opposed to not seeing the images (because your home internet access cannot keep up) and only inspecting 1 in a 100 images to see progress? 1. I don't need to *see* all images to verify whether the collection is going all right. If I collect remotely, I process remotely, no need to transfer images. Data is collected so fast today that you may, even while collecting at the synchrotron, finish the collection without a) seeing actually all the images (cf. Pilatus detectors) b) keeping in pace at all your data processing. The crystal died or was not collected properly? You try to understand why, you recollect it if possible or you try a new crystal. It's been always like this, it's call trial and error. 2. The ESRF in Grenoble produces thumbnails of the images. If all you want to see is whether there is diffraction, they are good enough and they are useful. They are extremely lossy and useless for anything else. 3. Please, compare contemporary facts. Today's bandwidth is what it is, today's images are *not* 100 Mb (yet). When they get there, let us know what is the bandwidth. I think there are instances where compressed (lossy or not) images will be invaluable. I know the above situation was not the context, but (y'all may gasp about this) I still have some friends (in the US) who live so far out in the wilderness that only dial-up internet is available. That while synchrotrons and the detectors used get better all the time, which means more MB/s produced. I would understand a situation like the one you describe for a poor, or an embargoed country where unfortunately there is no other way to connect to a synchrotron. Still, that should be solved by the community in a different way: by gracious cooperation with our colleagues in those countries. Your example is actually quite upsetting, given the current state of affairs in the world. James has already said (and I agree) that the original images (with all information) should not necessarily be thrown away. Perhaps a better question would be which would you use for what purpose, since I am convinced that compressed images are useful. I think I was clear: as long as we have access to the original data, I don't care. I would only use the original data. I would want to process the real thing, unless I have been shown by scientific evidence that the compressed thing works equally well. It seems reasonable to assume that such evidence can be acquired and/or that we can be shown by evidence what we gain and lose by lossy-compressed images. Key might be to be able to choose the best thing for your particular application/case/location etc. This still assumes that future software will not be able to detect the differences that you cannot see today. This may or may not be true, the consequences may or may not be important. But there is, I think, reasonable doubt on both questions. So yes, James, of course this is useful and not a waste of time. I have said to James, off the list, that he should go on if he's convinced about the usefulness of his approach. For a very scientific reason: I could be wrong. Yet, if need be to go into the compression path, I think we should prefer lossless options. Best regards, -- Miguel Architecture et Fonction des Macromolécules Biologiques (UMR6098) CNRS, Universités d'Aix-Marseille I II Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France Tel: +33(0) 491 82 55 93 Fax: +33(0) 491 26 67 20 mailto:miguel.ortiz-lombar...@afmb.univ-mrs.fr http://www.afmb.univ-mrs.fr/Miguel-Ortiz-Lombardia
Re: [ccp4bb] image compression
It would be a good start to get all images written now with lossless compression, instead of the uncompressed images we still get from the ADSC detectors. Something that we've been promised for many years Phil
Re: [ccp4bb] phaser openmp
On Tue, Nov 8, 2011 at 4:22 PM, Francois Berenger beren...@riken.jp wrote: In the past I have been quite badly surprised by the no-acceleration I gained when using OpenMP with some of my programs... :( Amdahl's law is cruel: http://en.wikipedia.org/wiki/Amdahl's_law This is the same reason why GPU acceleration isn't very useful for most crystallography software. -Nat
Re: [ccp4bb] image compression
ADSC has been a leader in supporting compressed CBF's. = Herbert J. Bernstein Professor of Mathematics and Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Tue, 8 Nov 2011, Phil Evans wrote: It would be a good start to get all images written now with lossless compression, instead of the uncompressed images we still get from the ADSC detectors. Something that we've been promised for many years Phil
Re: [ccp4bb] image compression
The mp3/music analogy might be quite appropriate. On some commercial music download sites, there are several options for purchase, ranging from audiophool-grade 24-bit, 192kHz sampled music, to CD-quality (16-bit, 44.1kHz), to mp3 compression and various lossy bit-rates. I am told that the resampling and compression is actually done on the fly by the server, from a single master, and the purchaser chooses what files to download based on cost, ability to play high-res data, degree of canine-like hearing, intolerance for lossy compression with its limited dynamic range, etc. Perhaps that would be the best way to handle it from a central repository, allowing the end-user to decide on the fly. The lossless files could somehow be tagged as such, to avoid confusion. Bill William G. Scott Professor Department of Chemistry and Biochemistry and The Center for the Molecular Biology of RNA 228 Sinsheimer Laboratories University of California at Santa Cruz Santa Cruz, California 95064 USA phone: +1-831-459-5367 (office) +1-831-459-5292 (lab) fax:+1-831-4593139 (fax)
Re: [ccp4bb] phaser openmp
See page 3 of this http://www-structmed.cimr.cam.ac.uk/phaser/ccp4-sw2011.pdf On Wed, 2011-11-09 at 09:22 +0900, Francois Berenger wrote: Hello, How faster is the OpenMP version of Phaser versus number of cores used? In the past I have been quite badly surprised by the no-acceleration I gained when using OpenMP with some of my programs... :( Regards, F. On 11/09/2011 02:59 AM, Dr G. Bunkoczi wrote: Hi Ed, in the CCP4 distribution, openmp is not enabled by default, and there seems to be no easy way to enable it (i.e. by setting a flag at the configure stage). On the other hand, you can easily create a separate build for phaser that is openmp enabled and use phaser from there. To do this, create a new folder, say phaser-build, cd into it, and issue the following commands (this assumes you are using bash): $ python $CCP4/lib/cctbx/cctbx_sources/cctbx_project/libtbx/configure.py --repository=$CCP4/src/phaser/source phaser --build-boost-python-extensions=False --enable-openmp-if-possible=True $ . ./setpaths.sh (source ./setpaths.csh with csh) $ libtbx.scons (if you have several CPUs, add -jX where X is the number of CPUs you want to use for compilation) This will build phaser that is openmp-enabled. You can also try passing the --static-exe flag (to configure.py), in which case the executable is static and can be relocated without any headaches. This works with certain compilers. Let me know if there are any problems! BW, Gabor On Nov 8 2011, Ed Pozharski wrote: Could anyone point me towards instructions on how to get/build parallelized phaser binary on linux? I searched around but so far found nothing. The latest updated phaser binary doesn't seem to be parallelized. Apologies if this has been resolved before - just point at the relevant thread, please.