Re: [ccp4bb] high z-scores, negative LLG in Phaser
Dear Sean, What a negative LLG combined with a large Z-score usually means is that the answer is correct, but that the model doesn't predict the data as well as expected. One possible reason is that you told Phaser that the model is better than it really is (e.g. provided identity values higher than they really are -- for instance, it's a common mistake to assign the sequence identity to 1 for a homology model, when it should be set to the sequence identity with the template used to build the model -- alternatively, the RMS error of the model is higher than one would expect from the sequence identity, perhaps because of domain movements). Another possibility is that you told Phaser the model is more complete than it really is -- the defined COMPOSITION has to describe what is in the asymmetric unit of the crystal, not what is in the model (another common mistake!). Something else that could play a role is that, under some circumstances, Phaser overestimates the accuracy of the structure factors derived from an ensemble; you could try scoring the individual models, and you should make sure that the individual structures in the ensemble are really well superimposed. If none of this explains your LLG values, then you could send me the logfile (preferably offline to avoid filling up too many mailboxes) and I could see if there's anything obvious in the output. Best wishes, Randy Read On 17 Sep 2009, at 19:06, Sean Gay wrote: I have a 2.0A data set that I solved using an ensemble of 5 related structures in Phaser. My Z-scores for the solution are fantastic (RFZ= 28.7, TFZ=24.6), but my LLG is very negative (-698.2). The LLG increases by almost 800 (started at -1488.6) during the course of the run. The density for the solution is great and the solution model fits it very well. I'm wondering why the Z-scores and LLG contradict each other. Should I be happy with the large increase in LLG or should I be concerned about the final value still being negative? Sean C. Gay, PhD Postdoctoral Scholar Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego 9500 Gilman Dr. La Jolla, CA 92093 -- Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills RoadE-mail: rj...@cam.ac.uk Cambridge CB2 0XY, U.K. www- structmed.cimr.cam.ac.uk
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine-sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5Gmydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8Gmydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
Hi David, If the data compression is carefully chosen you are right: lossless jpeg2000 compression on diffraction images works very well, but is a spot slow. The CBF compression using the byte offset method is a little less good at compression put massively faster... as you point out, this is the one used in the pilatus images. I recall that the .pck format used for the MAR image plates had the same property - it was quicker to read in a compressed image that the raw equivalent. So... once everyone is using the CBF standard for their images, with native lossless compression, it'll save a fair amount in disk space (=£/$), make life easier for people and - perhaps most importantly - save a lot of data transfer time. Now the funny thing with this is that if we compress the images before we store them, the compression implemented in the file system will be less effective... oh well, can't win em all... Cheers, Graeme 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) david.water...@diamond.ac.uk: Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine-sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5G mydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8G mydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
Re: [ccp4bb] lysozyme
pLysS contains a phage lysozyme. You can get it from any pLysS cells. Ditto pLysE Artem Dear crystallographers, does anyone happen to have a plasmid containing a lysozyme gene (any naturally occurring sequence) that would be suitable for use as a PCR template? We're hoping to use the plasmid for our lunchtime projects club for biology and chemistry A-level students. Many thanks in advance! Camille Shammas
Re: [ccp4bb] lysozyme
Hi Camille, I don't know if you have any protein labs around you but if someone is using rosetta or codon-plus type expression Ecoli strains those cells usually contain a plysS plasmid derivative that is chloramphenicol resistant and carries the gene encoding lysozyme among other things (plus the rare tRNA genes). If they don't have the plasmid at hand you can still grow the cells and make a miniprep of the plasmid (low copy) to have a lysozyme gene in your hands. Hope this helps. Let me know if you can't find it. -- Pascal F. Egea, PhD Assistant Professor UCLA, David Geffen School of Medicine Department of Biological Chemistry 314 Biomedical Sciences Research Building office (310)-825-1013 lab (310)-825-8722 email pe...@mednet.ucla.edu
[ccp4bb] Job openings for doctoral candidates and postal doctoral candidates in structural virology
At the Institute of Biochemistry of the University of Lübeck (Germany), job openings for *doctoral candidates and postal doctoral candidates in structural virology * are available to reinforce the ongoing research projects: 1. Postal-doctoral candidate in crystallography / structural virology (initially for 2 years, possibly up to 6 years (habilitation option). Salary scale: E13 TVL. Job identity number: 535/09 2. Doctoral candidate for structural virology and drug design (initially for 2 years). The aim of this project, in cooperation with the Heinrich-Pette-Institute in Hamburg, is the inhibition of the nuclear export of HIV-RNA. Salary scale: E13 TVL in part-time: 50%. Job identity number: 536/09 3. Doctoral candidate for structural virology (limited to 3 years). Aim of this project, in cooperation with the Bernhard-Nocht-Institute of tropical medicine in Hamburg, is the elucidation of the structure of the Lassavirus L protein. Salary scale: E13 TVL in part-time: 50%. Job identity number: 537/09 *BMBF-funded Junior Research Group at DESY, Hamburg* At the “laboratory for structural biology of infection and inflammation“, founded by universities of Lübeck and Hamburg and located at the DESY area, we plan to establish a BMBF-funded Junior Research Group for “structural infection biology under use of new radiation sources“. Aim is to explore new avenues for studying biomolecules and complexes relevant to infection by means of Raman spectroscopy with the UV free electron laser FLASH as well as SAXS and crystallography at the storage ring PETRA III. 4. Junior research group leader (initially for 2 years, up to 5 years possible (habilitation option). Salary scale: E14 TVL. The scientist is expected to participate in the teaching programme for medicine students at the University of Lübeck (Lehre in deutscher Sprache; die Lehrverpflichtung richtet sich nach § 4 der Lehrverpflichtungsverordnung Schleswig-Holstein). Job identity number: 538/09 5. Postal doctoral candidate (initially for 2 years). Salary scale: E13 TVL. Job identity number: 539/09 6. Technical assistant (initially for 2 years, up to five years possible). Duties cover sample preparation and technical assistance with the experimental work. Salary scale: up to E 8 TVL in part-time: 50%. Job identity number: 540/09 The Institute of Biochemistry investigates the molecular basis of intracellular infection by RNA viruses (primarily Coronaviruses incl. the SARS virus as well as Influenza virus and Enteroviruses), bacteria and protozoans, with the aim of the design and the development of new anti-infective compounds (see PLoS Pathogens 5, e1000428 (2009); Protein Science 18, 6 (2009); J. Mol. Biol. 383, 1081 (2008); Chem. Biol. 15, 597 (2008))). The work covers identification of potential virulence factors by means of proteomics, and by recombinant production, crystallisation and X-ray structural analysis of these proteins as well as the design and chemical synthesis of inhibitors. The Institute is well equipped and has direct access to synchrotron radiation at DESY. Please, consult our web page (www.biochem.uni-luebeck.de) for further information. The university is eager to reach a balance between female and male employees and strives to employ disabled persons. Hence, disabled applicants and females are especially requested to apply. The director of the Institute of Biochemistry, Prof. Dr. Rolf Hilgenfeld, can be contacted for further information. Tel. +49-451-5004060 or Email: hilgenf...@biochem.uni-luebeck.de Applications (please quote the job identity number!) should be mailed by September 30, 2009, to: University of Lübeck Dezernat Personal Ratzeburger Allee 160 D-23538 Lübeck Germany.
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
Dear all -- I cannot remember exactly, but I thought we had a long discussion on the rightness of using compressed images, especially when considering the loss of information while doing so. What was the conclusion of the debate again? (sorry, too lazy to dig in the archives). -- Leo -- On 18 Sep 2009, at 23:50, Graeme Winter wrote: Hi David, If the data compression is carefully chosen you are right: lossless jpeg2000 compression on diffraction images works very well, but is a spot slow. The CBF compression using the byte offset method is a little less good at compression put massively faster... as you point out, this is the one used in the pilatus images. I recall that the .pck format used for the MAR image plates had the same property - it was quicker to read in a compressed image that the raw equivalent. So... once everyone is using the CBF standard for their images, with native lossless compression, it'll save a fair amount in disk space (=£/$), make life easier for people and - perhaps most importantly - save a lot of data transfer time. Now the funny thing with this is that if we compress the images before we store them, the compression implemented in the file system will be less effective... oh well, can't win em all... Cheers, Graeme 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) david.water...@diamond.ac.uk: Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine- sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5Gmydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8Gmydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom Chavas Leonard, Ph.D. Assistant Professor Structural Biology Research Center Photon Factory High Energy Research Organization (KEK) 305-0801 Tsukuba Oho 1-1 Japan Tel: +81(0)29-864-5642 (4901) Fax: +81(0)29-864-2801 e-mail: leonard.cha...@kek.jp Science Advisory Board (BIT Life Sciences) Editorial Board (JAA) http://pfweis.kek.jp/~leo
[ccp4bb] Format issue with TLSIN/TLSOUT files - probably explains some refmac problems
Hi all, I have run into an issue that affects a number of CCP4 programs (and my own code as well). The problem Programs that produce TLSOUT descriptions of TLS parameters create a file using the equivalent of Fortran format (9F8.4) Here are two examples: TLS RANGE 'A 209.' 'A 220.' ALL ORIGIN7.895 -62.178 -23.423 T 0.9518 0.3476 0.5619 -0.0034 0.2309 0.0373 L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 S-0.2024 -0.2744 0.5817 0.2971 -0.1614 -1.0850 -0.1876 0.0462 TLS RANGE 'B 15.' 'B 21.' ALL ORIGIN 13.302 -6.004 38.582 T 0.0184 0.0726 0.0102 0.0294 -0.0001 0.0303 L10.9899108.7779 8.3249 28.9340 -8.6452-15.7119 S-0.5438 0.6025 -0.5212 0.5483 3.1784 1.6311 -0.1214 -0.1901 You see the problem ... If any element of the T, L, or S tensors is greater than 100 or less than -10 then two numbers run together in the output. This affects several hundred files in the current PDB, including some from my lab, e.g. 3BJE and 3I7F, which I used as examples above. If a program reads this in using a corresponding Fortran fixed format, OK. But the current versions of TLSANL and REFMAC5 don't do this; they instead use a home-grown free format input routine. TLSANL When TLSANL hits one of these files, it exits with the message ** INVALID CHARACTER . AT POSITION 16. *** FORMAT ERROR ON RECORD: L10.9899108.7779 8.3249 28.9340 -8.6452-15.7119 *** L OR S SPECIFICATION MISSING FOR TLS GROUP 12 12 This is annoying, but at least it's obvious that something went wrong. Refmac = When Refmac hits one of these, the result is more insidious. Instead of exiting with an error message, it prints a small warning, stores a mangled set of values, and continues. Here is a snippet from the log file from refinement of 3I7F Data line--- TLS Data line--- RANGE 'A 209.' 'A 220.' ALL Data line--- ORIGIN7.895 -62.178 -23.423 Data line--- T 0.9518 0.3476 0.5619 -0.0034 0.2309 0.0373 Data line--- L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 *** Warning Illegal number in field4 Data line--- S-0.2024 -0.2744 0.5817 0.2971 -0.1614 -1.0850 -0.1876 0.0462 [snip] Initial TLS parameters TLS group3: T tensor ( 3) =0.952 0.348 0.562 -0.003 0.231 0.037 L tensor ( 3) = 18.952 22.204 0.000 -1.828 2.899 0.000 S tensor ( 3) = -0.202 -0.274 0.582 0.297 -0.161 -1.085 -0.188 0.046 So the input tensor L L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 has become L18.952 22.204 0.000 -1.828 2.899 0.000 This perturbs the refinement, sometimes fatally. In fact, I think this is the reason I have had problems with TLS refinement in recent refmac versions. Every time you recycle the TLSOUT as the TLSIN for the next refinement round, you risk kicking one or more of the TLS group descriptions out into the next county. TLSMD The TLSIN files produced by the TLSMD server can trigger the same problem. This may explain why some people have reported problems with using the pair of files (XYZIN TLSIN) returned by the server for use in refmac refinement. Possible solutions == The obvious fix is to change all programs that create a TLSOUT file to guarantee that the tensor elements are separated by whitespace. At a minimum, this includes tlsmd tlsextract (from my lab, I've already changed our in-house copies) refmac5 anisoanl The new format could either be the equivalent of Fortran ( 9(X,F8.4)) or( 9(X,F8.3)) The first of these preserves the precision, but would break any program that uses a fixed format input statement describing the current format. The second is backward compatible with existing fixed format input programs, but loses one decimal of precision. Both TLSANL and REFMAC are happy if you add the extra whitespace, but I don't know what other programs out there might break because they use fixed format input. Please discuss. I want to modify the TLSMD server output accordingly. cheers, Ethan -- Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 98195-7742
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
http://proteincrystallography.org/ccp4bb/message2284.html The conclusion was that lossless compression can give us an average of 2.5-fold compression on diffraction images (more if they have no spots) and that lossy compression was something that might anger the caveat gods. -James Holton MAD Scientist Chavas Leo wrote: Dear all -- I cannot remember exactly, but I thought we had a long discussion on the rightness of using compressed images, especially when considering the loss of information while doing so. What was the conclusion of the debate again? (sorry, too lazy to dig in the archives). -- Leo -- On 18 Sep 2009, at 23:50, Graeme Winter wrote: Hi David, If the data compression is carefully chosen you are right: lossless jpeg2000 compression on diffraction images works very well, but is a spot slow. The CBF compression using the byte offset method is a little less good at compression put massively faster... as you point out, this is the one used in the pilatus images. I recall that the .pck format used for the MAR image plates had the same property - it was quicker to read in a compressed image that the raw equivalent. So... once everyone is using the CBF standard for their images, with native lossless compression, it'll save a fair amount in disk space (=£/$), make life easier for people and - perhaps most importantly - save a lot of data transfer time. Now the funny thing with this is that if we compress the images before we store them, the compression implemented in the file system will be less effective... oh well, can't win em all... Cheers, Graeme 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) david.water...@diamond.ac.uk: Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine-sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5Gmydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8Gmydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom Chavas Leonard, Ph.D. Assistant Professor Structural Biology Research Center Photon Factory High Energy Research Organization (KEK) 305-0801 Tsukuba Oho 1-1 Japan Tel: +81(0)29-864-5642 (4901) Fax: +81(0)29-864-2801 e-mail: leonard.cha...@kek.jp Science Advisory Board (BIT Life Sciences) Editorial Board (JAA) http://pfweis.kek.jp/~leo
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
On Friday 18 September 2009 12:47:20 Chavas Leo wrote: Dear all -- I cannot remember exactly, but I thought we had a long discussion on the rightness of using compressed images, especially when considering the loss of information while doing so. -- Leo -- On 18 Sep 2009, at 23:50, Graeme Winter wrote: Hi David, If the data compression is carefully chosen you are right: lossless jpeg2000 compression on diffraction images works very well, but is a spot slow. The CBF compression using the byte offset method is a little less good at compression put massively faster... as you point out, this is the one used in the pilatus images. Not all compression methods cause loss of information. cheers, Ethan I recall that the .pck format used for the MAR image plates had the same property - it was quicker to read in a compressed image that the raw equivalent. So... once everyone is using the CBF standard for their images, with native lossless compression, it'll save a fair amount in disk space (=£/$), make life easier for people and - perhaps most importantly - save a lot of data transfer time. Now the funny thing with this is that if we compress the images before we store them, the compression implemented in the file system will be less effective... oh well, can't win em all... Cheers, Graeme 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) david.water...@diamond.ac.uk: Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine- sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5Gmydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8Gmydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom Chavas Leonard, Ph.D. Assistant Professor Structural Biology Research Center Photon Factory High Energy Research Organization (KEK) 305-0801 Tsukuba Oho 1-1 Japan Tel: +81(0)29-864-5642 (4901) Fax: +81(0)29-864-2801 e-mail: leonard.cha...@kek.jp Science Advisory Board (BIT Life Sciences) Editorial Board (JAA) http://pfweis.kek.jp/~leo -- Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 98195-7742
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
I think it important to point out that despite the subject line, Dr. Scott's statement was: I think they process a bit faster too Strangely enough, this has not convinced me to re-format my RAID array with an new file system nor re-write all my software to support yet another new file format. I guess I am just lazy that way. Has anyone measured the speed increase? Have macs become I/O-bound again? In any case, I think it is important to remember that there are good reasons for leaving image file formats uncompressed. Probably the most important is the activation barrier to new authors writing new programs that read them. fread() is one thing, but finding the third-party code for a particular compression algorithm, navigating a CVS repository and linking to a library are quite another! This is actually quite a leap for those of us who never had any formal training in computer science. Personally, I still haven't figured out how to read pck images, as it is much easier to write jiffy programs for uncompressed data. For example, if all you want to do is extract a group of pixels (such as a spot), then you have to decompress the whole image! In computer speak: fseek() is rendered useless by compression. This could be why Mar opted not to use the pck compression for their newer CCD-based detectors? That said, compressed file systems do appear particularly attractive if space is limiting. Apparently HFS can do it, but what about other operating systems? Does anyone have experience with a Linux file system that both supports compression and doesn't get corrupted easily? -James Holton MAD Scientist Graeme Winter wrote: Hi David, If the data compression is carefully chosen you are right: lossless jpeg2000 compression on diffraction images works very well, but is a spot slow. The CBF compression using the byte offset method is a little less good at compression put massively faster... as you point out, this is the one used in the pilatus images. I recall that the .pck format used for the MAR image plates had the same property - it was quicker to read in a compressed image that the raw equivalent. So... once everyone is using the CBF standard for their images, with native lossless compression, it'll save a fair amount in disk space (=£/$), make life easier for people and - perhaps most importantly - save a lot of data transfer time. Now the funny thing with this is that if we compress the images before we store them, the compression implemented in the file system will be less effective... oh well, can't win em all... Cheers, Graeme 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) david.water...@diamond.ac.uk: Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine-sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5Gmydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8Gmydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
Re: [ccp4bb] Format issue with TLSIN/TLSOUT files - probably explains some refmac problems
I think if you are reading a file format which is defined to be fixed fields, you should read it as fixed fields. For better or for worst, the PDB format is defined so that each field has a particular column that it begins on and a column that it ends on. I've looked in the PDB format definition on the RCSB website and, while it doesn't specifically give a Fortran format statement for TLS parameter, in it's discussion of REMARK 3 it often shows a line of column numbers. I presume those lines indicate that column numbers are important even for data in REMARK 3. Dale Ethan Merritt wrote: Hi all, I have run into an issue that affects a number of CCP4 programs (and my own code as well). The problem Programs that produce TLSOUT descriptions of TLS parameters create a file using the equivalent of Fortran format (9F8.4) Here are two examples: TLS RANGE 'A 209.' 'A 220.' ALL ORIGIN7.895 -62.178 -23.423 T 0.9518 0.3476 0.5619 -0.0034 0.2309 0.0373 L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 S-0.2024 -0.2744 0.5817 0.2971 -0.1614 -1.0850 -0.1876 0.0462 TLS RANGE 'B 15.' 'B 21.' ALL ORIGIN 13.302 -6.004 38.582 T 0.0184 0.0726 0.0102 0.0294 -0.0001 0.0303 L10.9899108.7779 8.3249 28.9340 -8.6452-15.7119 S-0.5438 0.6025 -0.5212 0.5483 3.1784 1.6311 -0.1214 -0.1901 You see the problem ... If any element of the T, L, or S tensors is greater than 100 or less than -10 then two numbers run together in the output. This affects several hundred files in the current PDB, including some from my lab, e.g. 3BJE and 3I7F, which I used as examples above. If a program reads this in using a corresponding Fortran fixed format, OK. But the current versions of TLSANL and REFMAC5 don't do this; they instead use a home-grown free format input routine. TLSANL When TLSANL hits one of these files, it exits with the message ** INVALID CHARACTER . AT POSITION 16. *** FORMAT ERROR ON RECORD: L10.9899108.7779 8.3249 28.9340 -8.6452-15.7119 *** L OR S SPECIFICATION MISSING FOR TLS GROUP 12 12 This is annoying, but at least it's obvious that something went wrong. Refmac = When Refmac hits one of these, the result is more insidious. Instead of exiting with an error message, it prints a small warning, stores a mangled set of values, and continues. Here is a snippet from the log file from refinement of 3I7F Data line--- TLS Data line--- RANGE 'A 209.' 'A 220.' ALL Data line--- ORIGIN7.895 -62.178 -23.423 Data line--- T 0.9518 0.3476 0.5619 -0.0034 0.2309 0.0373 Data line--- L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 *** Warning Illegal number in field4 Data line--- S-0.2024 -0.2744 0.5817 0.2971 -0.1614 -1.0850 -0.1876 0.0462 [snip] Initial TLS parameters TLS group3: T tensor ( 3) =0.952 0.348 0.562 -0.003 0.231 0.037 L tensor ( 3) = 18.952 22.204 0.000 -1.828 2.899 0.000 S tensor ( 3) = -0.202 -0.274 0.582 0.297 -0.161 -1.085 -0.188 0.046 So the input tensor L L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 has become L18.952 22.204 0.000 -1.828 2.899 0.000 This perturbs the refinement, sometimes fatally. In fact, I think this is the reason I have had problems with TLS refinement in recent refmac versions. Every time you recycle the TLSOUT as the TLSIN for the next refinement round, you risk kicking one or more of the TLS group descriptions out into the next county. TLSMD The TLSIN files produced by the TLSMD server can trigger the same problem. This may explain why some people have reported problems with using the pair of files (XYZIN TLSIN) returned by the server for use in refmac refinement. Possible solutions == The obvious fix is to change all programs that create a TLSOUT file to guarantee that the tensor elements are separated by whitespace. At a minimum, this includes tlsmd tlsextract (from my lab, I've already changed our in-house copies) refmac5 anisoanl The new format could either be the equivalent of Fortran ( 9(X,F8.4)) or( 9(X,F8.3)) The first of these preserves the precision, but would break any program that uses a fixed format input statement describing the current format. The second is backward compatible with existing fixed format input programs, but loses one decimal of precision. Both TLSANL and REFMAC are happy if you add the extra whitespace, but I don't know what other programs out there might break because they use fixed format input. Please discuss. I want to modify the TLSMD server output accordingly. cheers, Ethan
Re: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster
The current bottleneck with file systems is the speed of getting data on or off the magnetic surface. So filesystem compression helps, as less data needs to be physically written or read per image. The CPU time spent compressing the data is less than the time saved in writing less data to the surface. I would be interested to see if the speed up is the same with a solid state drive, as there is near 'random access' here, unlike with a magnetic drive where the seek time is one of the bottlenecks. For example, mechanical hard drives are limited to about 130MB/s, whereas SSDs can already manage 200MB/s (faster than a first generation SATA interface at 150MB/s can cope with and one of the drivers behind the 2nd (300MB/s) and 3rd generation (600MB/s) SATA intefaces). The large size of our image files should make them ideal for use with SSDs. Quoting James Holton jmhol...@lbl.gov: I think it important to point out that despite the subject line, Dr. Scott's statement was: I think they process a bit faster too Strangely enough, this has not convinced me to re-format my RAID array with an new file system nor re-write all my software to support yet another new file format. I guess I am just lazy that way. Has anyone measured the speed increase? Have macs become I/O-bound again? In any case, I think it is important to remember that there are good reasons for leaving image file formats uncompressed. Probably the most important is the activation barrier to new authors writing new programs that read them. fread() is one thing, but finding the third-party code for a particular compression algorithm, navigating a CVS repository and linking to a library are quite another! This is actually quite a leap for those of us who never had any formal training in computer science. Personally, I still haven't figured out how to read pck images, as it is much easier to write jiffy programs for uncompressed data. For example, if all you want to do is extract a group of pixels (such as a spot), then you have to decompress the whole image! In computer speak: fseek() is rendered useless by compression. This could be why Mar opted not to use the pck compression for their newer CCD-based detectors? That said, compressed file systems do appear particularly attractive if space is limiting. Apparently HFS can do it, but what about other operating systems? Does anyone have experience with a Linux file system that both supports compression and doesn't get corrupted easily? -James Holton MAD Scientist Graeme Winter wrote: Hi David, If the data compression is carefully chosen you are right: lossless jpeg2000 compression on diffraction images works very well, but is a spot slow. The CBF compression using the byte offset method is a little less good at compression put massively faster... as you point out, this is the one used in the pilatus images. I recall that the .pck format used for the MAR image plates had the same property - it was quicker to read in a compressed image that the raw equivalent. So... once everyone is using the CBF standard for their images, with native lossless compression, it'll save a fair amount in disk space (=£/$), make life easier for people and - perhaps most importantly - save a lot of data transfer time. Now the funny thing with this is that if we compress the images before we store them, the compression implemented in the file system will be less effective... oh well, can't win em all... Cheers, Graeme 2009/9/18 Waterman, David (DLSLtd,RAL,DIA) david.water...@diamond.ac.uk: Just to comment on this, my friend in the computer game industry insists that compression begets speed in almost all data handling situations. This will be worth bearing in mind as we start to have more fine-sliced Pilatus 6M (or similar) datasets to deal with. Cheers, David. -Original Message- From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of William G. Scott Sent: 17 September 2009 22:48 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] I compressed my images by ~ a factor of two, and they load and process in mosflm faster If you have OS X 10.6, this will impress your friends and save you some disk space: % du -h -d 1 mydata 3.5Gmydata mv mydata mydata.1 sudo ditto --hfsCompression mydata.1 mydata rm -rf mydata.1 % du -h -d 1 mydata 1.8Gmydata This does hfs filesystem compression, so the images are still recognized by mosflm, et al. I think they process a bit faster too, because half the information is packed into the resource fork. This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to
Re: [ccp4bb] Format issue with TLSIN/TLSOUT files - probably explains some refmac problems
On Friday 18 September 2009 13:40:30 Dale Tronrud wrote: I think if you are reading a file format which is defined to be fixed fields, you should read it as fixed fields. For better or for worst, the PDB format is defined so that each field has a particular column that it begins on and a column that it ends on. Sure. But I'm not talking about fields in a PDB file. PDB file formats have their own major problems, but let's leave that for another day. The TLSIN/TLSOUT files are not defined or described by the PDB. Their use is a feature peculiar to several CCP4 programs, and to some unknown (to me) number of other programs that may be used in conjunction with CCP4. The TLSMD server is an example. cheers, Ethan I've looked in the PDB format definition on the RCSB website and, while it doesn't specifically give a Fortran format statement for TLS parameter, in it's discussion of REMARK 3 it often shows a line of column numbers. I presume those lines indicate that column numbers are important even for data in REMARK 3. Dale Ethan Merritt wrote: Hi all, I have run into an issue that affects a number of CCP4 programs (and my own code as well). The problem Programs that produce TLSOUT descriptions of TLS parameters create a file using the equivalent of Fortran format (9F8.4) Here are two examples: TLS RANGE 'A 209.' 'A 220.' ALL ORIGIN7.895 -62.178 -23.423 T 0.9518 0.3476 0.5619 -0.0034 0.2309 0.0373 L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 S-0.2024 -0.2744 0.5817 0.2971 -0.1614 -1.0850 -0.1876 0.0462 TLS RANGE 'B 15.' 'B 21.' ALL ORIGIN 13.302 -6.004 38.582 T 0.0184 0.0726 0.0102 0.0294 -0.0001 0.0303 L10.9899108.7779 8.3249 28.9340 -8.6452-15.7119 S-0.5438 0.6025 -0.5212 0.5483 3.1784 1.6311 -0.1214 -0.1901 You see the problem ... If any element of the T, L, or S tensors is greater than 100 or less than -10 then two numbers run together in the output. This affects several hundred files in the current PDB, including some from my lab, e.g. 3BJE and 3I7F, which I used as examples above. If a program reads this in using a corresponding Fortran fixed format, OK. But the current versions of TLSANL and REFMAC5 don't do this; they instead use a home-grown free format input routine. TLSANL When TLSANL hits one of these files, it exits with the message ** INVALID CHARACTER . AT POSITION 16. *** FORMAT ERROR ON RECORD: L10.9899108.7779 8.3249 28.9340 -8.6452-15.7119 *** L OR S SPECIFICATION MISSING FOR TLS GROUP 12 12 This is annoying, but at least it's obvious that something went wrong. Refmac = When Refmac hits one of these, the result is more insidious. Instead of exiting with an error message, it prints a small warning, stores a mangled set of values, and continues. Here is a snippet from the log file from refinement of 3I7F Data line--- TLS Data line--- RANGE 'A 209.' 'A 220.' ALL Data line--- ORIGIN7.895 -62.178 -23.423 Data line--- T 0.9518 0.3476 0.5619 -0.0034 0.2309 0.0373 Data line--- L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 *** Warning Illegal number in field4 Data line--- S-0.2024 -0.2744 0.5817 0.2971 -0.1614 -1.0850 -0.1876 0.0462 [snip] Initial TLS parameters TLS group3: T tensor ( 3) =0.952 0.348 0.562 -0.003 0.231 0.037 L tensor ( 3) = 18.952 22.204 0.000 -1.828 2.899 0.000 S tensor ( 3) = -0.202 -0.274 0.582 0.297 -0.161 -1.085 -0.188 0.046 So the input tensor L L18.9522 22.2045 0.6690-19.0722 -1.8277 2.8987 has become L18.952 22.204 0.000 -1.828 2.899 0.000 This perturbs the refinement, sometimes fatally. In fact, I think this is the reason I have had problems with TLS refinement in recent refmac versions. Every time you recycle the TLSOUT as the TLSIN for the next refinement round, you risk kicking one or more of the TLS group descriptions out into the next county. TLSMD The TLSIN files produced by the TLSMD server can trigger the same problem. This may explain why some people have reported problems with using the pair of files (XYZIN TLSIN) returned by the server for use in refmac refinement. Possible solutions == The obvious fix is to change all programs that create a TLSOUT file to guarantee that the tensor elements are separated by whitespace. At a minimum, this includes tlsmd tlsextract (from my lab, I've already changed our in-house copies) refmac5 anisoanl The new format could either be the equivalent of Fortran ( 9(X,F8.4)) or