Re: [ccp4bb] 3ftt and gremlins
Pavel, I have a problem with the number of reflections for refinement PHENIX report a. I have almost complete data for the data set - 48071 reflections. I keep always anomalous pairs separated. PHENIX reports (using phenix.model_vs_data.log) 94475 reflections (reasonable, taking into account 98% data and possible mismatch in anomalous pairs). I assume that (+) and (-) are separated in this case. b. Problems come when depositing data to PDB using their tools. They do not accept ( I think) anomalous separated data and complain about number of reflections that is not consistent between reflections file and PDB file header. c. What happened in the extreme cases? Let us say that for anisotropic data I have 78% of completeness. The anomalous data is separated. Will PHENIX use during refinement complete set of reflections ( for various density maps calculation) or it will use only WHAT IS THERE IN THE INPUT FILE? Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica D, co-editor e-mail: mbfro...@post.tau.ac.il Tel: ++972 3640 8723 Fax: ++972 3640 9407 Cellular: ++972 547 459 608 On Mar 12, 2009, at 8:27 PM, Pavel Afonine wrote: Hi Dale, 1) There is a need for additional validation of structure factor depositions. PHENIX has tools for this: 1) phenix.cif_as_mtz will convert the PDB data file with diffrcation data into MTZ file. It automatically will figure out if the data are X-ray: Iobs or Fobs, or Neutron Fobs or Iobs. 2) The next step will be running phenix.model_vs_data that will take the MTZ from from above and the corresponding PDB file and give a complete statistics, that you can immediately compare with the published value. Note, phenix.model_vs_data can handle: - twinned data; - neutron data; - all unknown ligands dictionaries are generated internally; - PDB files with multiple models (with multiple MODEL records). I run it every month or two, and so I have a nice list of interesting cases. The database of all converted to MTZ data files is used internally by PHENIX developers for various developments etc... In fact, this was used in POLYGON validation tool: Acta Cryst. (2009). D65, 297-300 Pavel.
Re: [ccp4bb] 3ftt and gremlins
My personal experience is ( I frequently re-refine structures I cite if all the data for that exist in PDB) that PDB possesses a significant number of artifacts unsupported by reality but by the wild imagination only. These artifacts are originated from the modest, good and excellent laboratories alike. They are maybe not as sounding as tracing the protein main-chain in reverse mode, but sometimes they support quite significant and sounding conclusions. I myself suffered frequently on a stage of structure validations by PDB due to the wrong treatment of the anisotropic thermal parameters and erroneous Rfactor calculationsdue to that during structure factors validation. I think this problem is resolved now or at least I am not bothered by annotators anymore for that matter. Personally I dot believe that by fingerposting events of miss- interpretations and errors that are difficult to catch will help to resolve the situation. Peer-reviewing of the data that enter the PDB is unrealistic. Automatic re-refinement of the all PDB content which is in the tune with modern high-throughput of everything approaches will not solve the problem either. It will produce a bit better refinement statistics in the best cases. Nothing can change human eye interpretation of the electron density until AI problem will be solved. Responsibility for the correct interpretation of the structure is of these who publish it and of these who cite it. To resume I only wonder, why to fingerpost to 3ftt direction, why now, why in public and why so emotional ? Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica D, co-editor e-mail: mbfro...@post.tau.ac.il Tel: ++972 3640 8723 Fax: ++972 3640 9407 Cellular: ++972 547 459 608
Re: [ccp4bb] 3ftt and gremlins
It would be possible for the deposition sites to run a few simple tests to at least find cases where intensities are labelled as amplitudes or vice versa - the truncate plots of moments and cumulative intensities at least would show something was wrong. Eleanor Wladek Minor wrot Dear All, I just received information from Michael Gao (PDB) that updated sf file which is scheduled to be released on March 17, 2009 to replace the current incorrect sf file. Wladek Dr. Wladek Minor Professor of Molecular Physiology and Biological Physics Phone: 434-243-6865 Fax: 434-982-1616 http://krzys.med.virginia.edu/CrystUVa/wladek.htm US-mail address: Department of Molecular Physiology and Biological Physics University of Virginia PO Box 800736, Charlottesville, VA 22908-0736 Fed-Ex address: Department of Molecular Physiology and Biological Physics 1300 Jefferson Park Avenue University of Virginia Charlottesville, VA 22908
Re: [ccp4bb] 3ftt and gremlins
Dear Eleanor, That is a useful suggestion, but in the case of 3ftt it would not have helped: the amplitudes would have looked as healthy as can be (they were calculated!), and it was the associated Sigmas that had absurd values, being in fact phases in degrees. A sanity check on some (recalculated) I/sig(I) statistics could have detected that something was fishy. Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! With best wishes, Gerard. -- On Thu, Mar 12, 2009 at 09:22:26AM +, Eleanor Dodson wrote: It would be possible for the deposition sites to run a few simple tests to at least find cases where intensities are labelled as amplitudes or vice versa - the truncate plots of moments and cumulative intensities at least would show something was wrong. Eleanor -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * ===
Re: [ccp4bb] 3ftt and gremlins
Eleanor, Please note that so far CIF structure factors files were not sent by deposition site for verification to authors. I spoke with Helen and John last Saturday and they promised me to change that policy. Alternatively, one can check his/hers new deposit on every Wednesday morning. Best regards Wladek At 05:22 AM 3/12/2009, Eleanor Dodson wrote: It would be possible for the deposition sites to run a few simple tests to at least find cases where intensities are labelled as amplitudes or vice versa - the truncate plots of moments and cumulative intensities at least would show something was wrong. Eleanor Wladek Minor wrot Dear All, I just received information from Michael Gao (PDB) that updated sf file which is scheduled to be released on March 17, 2009 to replace the current incorrect sf file. Wladek Dr. Wladek Minor Professor of Molecular Physiology and Biological Physics Phone: 434-243-6865 Fax: 434-982-1616 http://krzys.med.virginia.edu/CrystUVa/wladek.htm US-mail address: Department of Molecular Physiology and Biological Physics University of Virginia PO Box 800736, Charlottesville, VA 22908-0736 Fed-Ex address: Department of Molecular Physiology and Biological Physics 1300 Jefferson Park Avenue University of Virginia Charlottesville, VA 22908 Dr. Wladek Minor Professor of Molecular Physiology and Biological Physics Phone: 434-243-6865 Fax: 434-982-1616 http://krzys.med.virginia.edu/CrystUVa/wladek.htm US-mail address: Department of Molecular Physiology and Biological Physics University of Virginia PO Box 800736, Charlottesville, VA 22908-0736 Fed-Ex address: Department of Molecular Physiology and Biological Physics 1300 Jefferson Park Avenue University of Virginia Charlottesville, VA 22908
Re: [ccp4bb] 3ftt and gremlins
Dear Eleanor and Gerard, PDB testing is performed on MTZ file and can not detect conversion errors. Wladek At 10:03 AM 3/12/2009, Gerard Bricogne wrote: Dear Eleanor, That is a useful suggestion, but in the case of 3ftt it would not have helped: the amplitudes would have looked as healthy as can be (they were calculated!), and it was the associated Sigmas that had absurd values, being in fact phases in degrees. A sanity check on some (recalculated) I/sig(I) statistics could have detected that something was fishy. Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! With best wishes, Gerard. -- On Thu, Mar 12, 2009 at 09:22:26AM +, Eleanor Dodson wrote: It would be possible for the deposition sites to run a few simple tests to at least find cases where intensities are labelled as amplitudes or vice versa - the truncate plots of moments and cumulative intensities at least would show something was wrong. Eleanor -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * === Dr. Wladek Minor Professor of Molecular Physiology and Biological Physics Phone: 434-243-6865 Fax: 434-982-1616 http://krzys.med.virginia.edu/CrystUVa/wladek.htm US-mail address: Department of Molecular Physiology and Biological Physics University of Virginia PO Box 800736, Charlottesville, VA 22908-0736 Fed-Ex address: Department of Molecular Physiology and Biological Physics 1300 Jefferson Park Avenue University of Virginia Charlottesville, VA 22908
Re: [ccp4bb] 3ftt and gremlins
Gerard Bricogne wrote: Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! That may be so -- but if I'm hungry now, I just pop it in the sink -- I don't publish a call for tenders on an industrial-scale dish-washer, call up the architects and engineers to redesign the room, re-lay the plumbing, vamp up my electricity transformer and install a new drainage system. Which doesn't mean the industrial-scale washer isn't necessary; but honestly, can't we start by just washing the plate?? phx.
Re: [ccp4bb] 3ftt and gremlins
This thread has evolved into two different topics. Just to clarify: 1) There is a need for additional validation of structure factor depositions. My recollection is that the output of SF Check is available to the depositor via ADIT on the RCSB site. I have found that report to be quite helpful in checking for gross errors in my structure factor files. The Electron Density Server performs similar checks. It shows that the R value for 3ftt is 6.4% with a correlation coefficient between Fo and Fc of 0.996. The EDS flags entries as interesting if the calculated R value is more than 5% higher than the reported R value. Maybe it should also note when the R value is more than 5% lower. The tools for validating structure factors exist but perhaps could be put more in the face of the depositor to more strongly encourage that they be looked at. 2) It would be useful to have a central repository of raw diffraction images. Most of the discussion on this point is the technical difficulty of storing this quantity of data. What has not been mentioned is the much greater difficulty of validating these images. You may think the images for an entry have been deposited only to find out that the investigator's wedding photos were accidentally deposited instead. Validating that the images correspond to the claimed structure will be an enormous task; probably more difficult than coming up with enough hard drives to store them all. Dale Tronrud Frank von Delft wrote: Gerard Bricogne wrote: Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! That may be so -- but if I'm hungry now, I just pop it in the sink -- I don't publish a call for tenders on an industrial-scale dish-washer, call up the architects and engineers to redesign the room, re-lay the plumbing, vamp up my electricity transformer and install a new drainage system. Which doesn't mean the industrial-scale washer isn't necessary; but honestly, can't we start by just washing the plate?? phx.
Re: [ccp4bb] 3ftt and gremlins
Dear Frank, Thank you for your answer, as unimitable as ever. We do of course have to wash one plate at a time when we each feel the pinch of hunger; but as we do so we should not forget that the PDB is the Central Planning Office, and that the order for that industrial-scale dishwasher has to be filed and lobbied for if it is ever to be delivered. Otherwise we will continue washing our own plates for ever. Besides, on the serious side of the argument, there are other benefits to the deposition of images, as argued in the paper by Joosten et al. in the recent Acta D volume containing the proceedings of last year's CCP4 Study Weekend. With best wishes, Gerard. -- On Thu, Mar 12, 2009 at 04:07:36PM +, Frank von Delft wrote: Gerard Bricogne wrote: Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! That may be so -- but if I'm hungry now, I just pop it in the sink -- I don't publish a call for tenders on an industrial-scale dish-washer, call up the architects and engineers to redesign the room, re-lay the plumbing, vamp up my electricity transformer and install a new drainage system. Which doesn't mean the industrial-scale washer isn't necessary; but honestly, can't we start by just washing the plate?? phx. -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * ===
Re: [ccp4bb] 3ftt and gremlins
Hi Dale, 1) There is a need for additional validation of structure factor depositions. PHENIX has tools for this: 1) phenix.cif_as_mtz will convert the PDB data file with diffrcation data into MTZ file. It automatically will figure out if the data are X-ray: Iobs or Fobs, or Neutron Fobs or Iobs. 2) The next step will be running phenix.model_vs_data that will take the MTZ from from above and the corresponding PDB file and give a complete statistics, that you can immediately compare with the published value. Note, phenix.model_vs_data can handle: - twinned data; - neutron data; - all unknown ligands dictionaries are generated internally; - PDB files with multiple models (with multiple MODEL records). I run it every month or two, and so I have a nice list of interesting cases. The database of all converted to MTZ data files is used internally by PHENIX developers for various developments etc... In fact, this was used in POLYGON validation tool: Acta Cryst. (2009). D65, 297-300 Pavel.
Re: [ccp4bb] 3ftt and gremlins
Dear Dale, On Thu, Mar 12, 2009 at 11:07:05AM -0700, Dale Tronrud wrote: This thread has evolved into two different topics. Just to clarify: 1) There is a need for additional validation of structure factor depositions. My recollection is that the output of SF Check is available to the depositor via ADIT on the RCSB site. I have found that report to be quite helpful in checking for gross errors in my structure factor files. The Electron Density Server performs similar checks. It shows that the R value for 3ftt is 6.4% with a correlation coefficient between Fo and Fc of 0.996. The EDS flags entries as interesting if the calculated R value is more than 5% higher than the reported R value. Maybe it should also note when the R value is more than 5% lower. The tools for validating structure factors exist but perhaps could be put more in the face of the depositor to more strongly encourage that they be looked at. 2) It would be useful to have a central repository of raw diffraction images. Most of the discussion on this point is the technical difficulty of storing this quantity of data. What has not been mentioned is the much greater difficulty of validating these images. You may think the images for an entry have been deposited only to find out that the investigator's wedding photos were accidentally deposited instead. My suggestion would be to give the images to (say) XDS: it would run successfully on wedding photographs only in rare cases where the group photograph was taken from a helicopter and the guests were arranged in very peculiar ways ... . Validating that the images correspond to the claimed structure will be an enormous task; probably more difficult than coming up with enough hard drives to store them all. Not necessarily, unless the crystalline specimen is very poor. In ordinary cases, instead of comparing structure factor amplitudes or intensities calculated from the deposited model to those in a file of deposited values, one would run an integration program (or several of them) on the images, check that cell parameters and space group agree, then run TRUNCATE if amplitudes are desired, to get those observed values (up to some re-indexing). For this to be possible automatically one would have to be much stricter with the completeness and accuracy of the information in image headers produced by various detectors, a step that I think many people would welcome. With best wishes, Gerard. Dale Tronrud Frank von Delft wrote: Gerard Bricogne wrote: Looking forward to the archiving of the REAL data ... i.e. the images. Using any other form of data is like having to eat out of someone else's dirty plate! That may be so -- but if I'm hungry now, I just pop it in the sink -- I don't publish a call for tenders on an industrial-scale dish-washer, call up the architects and engineers to redesign the room, re-lay the plumbing, vamp up my electricity transformer and install a new drainage system. Which doesn't mean the industrial-scale washer isn't necessary; but honestly, can't we start by just washing the plate?? phx. -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * ===
Re: [ccp4bb] 3ftt and gremlins
Dear Dale, 1) There is a need for additional validation of structure factor depositions. My recollection is that the output of SF Check is available to the depositor via ADIT on the RCSB site. I have found that report to be quite helpful in checking for gross errors in my structure factor files. The Electron Density Server performs similar checks. It shows that the R value for 3ftt is 6.4% with a correlation coefficient between Fo and Fc of 0.996. The EDS flags entries as interesting if the calculated R value is more than 5% higher than the reported R value. Maybe it should also note when the R value is more than 5% lower. The tools for validating structure factors exist but perhaps could be put more in the face of the depositor to more strongly encourage that they be looked at. Thank you for stressing this again. I had a look at the PDB_REDO results for 3ftt. It uses a procedure similar to that of the EDS, so I wanted to make sure 3ftt was rejected because R(-free) could not be reproduced. However, 3ftt was not in PDB_REDO for a different reason: the R-free set had no information content. That is, all reflection have the same status flag. There are frequent discussions in the CCP4BB about the (un)importance of keeping the R-free set. It would certainly be nice if the PDB would also warn about R-free set problems to the depositor. Funny thing for the status flags of 3ftt: all reflections are marked 'x' (i.e. unmeasured). We should have known that there was something wrong with them. Cheers, Robbie Joosten
[ccp4bb] 3ftt and gremlins
Dear Michael, As we already wrote to Helen and John, structure factors for our deposit (PDB ID 3FTT and RCSB ID RCSB051033) were mis-processed. Currently, what was pointed on CCP4BB, instead of experimental amplitudes and sigmas PDB reports calculated amplitudes and phases. Most probably our deposited mtz file was wrongly processed because of non-standard labels for some data (F_set1 instead of F and SIGF_set1 instead of SIGF). Could you send me as soon as possible information on processing of our data file and correct our deposit? If it is necessary I will send you file with structure factors. I am including letter that I have sent to Helen and John on Saturday morning: Dear Helen and John, Following our conversation in the morning, please look into the problem with conversion of our MTZ data (for the 3ftt deposit) into structure factor CIF files. Apparently, instead of Fobs and sigmas, the Fcalc and phases are in the CIF file. To avoid these problems in the future, I propose that the CIF file containing structure factors is sent to authors for approval together with other files. I hope that we can resolve that quickly, taking into account the current discussion on the ccp4 bulletin board. I will copy you with discussion on ccp4 bulletin board initiated by Gerard. Best regards Wladek Best regards, Wladek Dr. Wladek Minor Professor of Molecular Physiology and Biological Physics Phone: 434-243-6865 Fax: 434-982-1616 http://krzys.med.virginia.edu/CrystUVa/wladek.htm US-mail address: Department of Molecular Physiology and Biological Physics University of Virginia PO Box 800736, Charlottesville, VA 22908-0736 Fed-Ex address: Department of Molecular Physiology and Biological Physics 1300 Jefferson Park Avenue University of Virginia Charlottesville, VA 22908
Re: [ccp4bb] 3ftt and gremlins
Dear All, I just received information from Michael Gao (PDB) that updated sf file which is scheduled to be released on March 17, 2009 to replace the current incorrect sf file. Wladek Dr. Wladek Minor Professor of Molecular Physiology and Biological Physics Phone: 434-243-6865 Fax: 434-982-1616 http://krzys.med.virginia.edu/CrystUVa/wladek.htm US-mail address: Department of Molecular Physiology and Biological Physics University of Virginia PO Box 800736, Charlottesville, VA 22908-0736 Fed-Ex address: Department of Molecular Physiology and Biological Physics 1300 Jefferson Park Avenue University of Virginia Charlottesville, VA 22908