Re: [ccp4bb] 3ftt and gremlins

2009-03-15 Thread Felix Frolow
Pavel, I have a problem with the number of reflections for refinement  
PHENIX report
a. I have almost complete data for the data set - 48071 reflections. I  
keep always anomalous pairs separated.
PHENIX reports (using phenix.model_vs_data.log) 94475 reflections  
(reasonable, taking into account 98% data and possible mismatch in  
anomalous pairs). I assume that (+) and (-) are separated in this case.
b. Problems come when depositing data to PDB using their tools. They  
do not accept ( I think) anomalous separated data and
complain about number of reflections that is not consistent between  
reflections file and PDB file header.
c. What happened in the extreme cases? Let us say that for anisotropic  
data I have 78% of completeness. The anomalous data is separated.
Will PHENIX use during refinement complete set of reflections ( for  
various density maps calculation) or it will use only WHAT IS THERE IN  
THE INPUT FILE?


Dr  Felix Frolow
Professor of Structural Biology and Biotechnology
Department of Molecular Microbiology
and Biotechnology
Tel Aviv University 69978, Israel

Acta Crystallographica D, co-editor

e-mail: mbfro...@post.tau.ac.il
Tel:   ++972 3640 8723
Fax:  ++972 3640 9407
Cellular:   ++972 547 459 608

On Mar 12, 2009, at 8:27 PM, Pavel Afonine wrote:


Hi Dale,


1)  There is a need for additional validation of structure factor
   depositions.



PHENIX has tools for this:

1) phenix.cif_as_mtz will convert the PDB data file with  
diffrcation data into MTZ file. It automatically will figure out if  
the data are X-ray: Iobs or Fobs, or Neutron Fobs or Iobs.


2) The next step will be running phenix.model_vs_data

that will take the MTZ from from above and the corresponding PDB  
file and give a complete statistics, that you can immediately  
compare with the published value.


Note, phenix.model_vs_data can handle:

- twinned data;
- neutron data;
- all unknown ligands dictionaries are generated internally;
- PDB files with multiple models (with multiple MODEL records).

I run it every month or two, and so I have a nice list of  
interesting cases. The database of all converted to MTZ data files  
is used internally by PHENIX developers for various developments  
etc...


In fact, this was used in POLYGON validation tool: Acta Cryst.  
(2009). D65, 297-300


Pavel.


Re: [ccp4bb] 3ftt and gremlins

2009-03-13 Thread Felix Frolow
My personal experience is ( I frequently re-refine structures I cite  
if all the data for that exist
in PDB) that  PDB possesses  a significant number  of artifacts   
unsupported by reality but by the wild imagination only. These  
artifacts are originated from the modest, good and excellent  
laboratories alike.
They are maybe not as sounding as tracing the protein main-chain in  
reverse mode, but sometimes they support quite  significant and  
sounding conclusions.
I myself suffered frequently on a stage of structure validations by  
PDB due to the wrong treatment of the anisotropic
thermal parameters and erroneous Rfactor calculationsdue to that   
during structure factors validation.  I  think this problem is  
resolved now or at least I am not bothered by annotators anymore for  
that matter.
Personally I dot believe that by fingerposting  events of miss- 
interpretations and errors that are difficult to catch
will help to resolve the situation. Peer-reviewing of the data that  
enter the PDB is unrealistic. Automatic re-refinement of the all PDB  
content which is in the tune with modern high-throughput  of  
everything approaches will not solve the problem either. It will  
produce a bit better refinement statistics in the best cases. Nothing  
can change
human eye interpretation of the electron density until AI problem will  
be solved. Responsibility for the correct interpretation of the  
structure is of these  who publish it and of these who cite it.
To resume I only wonder, why to fingerpost to 3ftt direction, why now,  
why in public and why so emotional ?



Dr  Felix Frolow
Professor of Structural Biology and Biotechnology
Department of Molecular Microbiology
and Biotechnology
Tel Aviv University 69978, Israel

Acta Crystallographica D, co-editor

e-mail: mbfro...@post.tau.ac.il
Tel:   ++972 3640 8723
Fax:  ++972 3640 9407
Cellular:   ++972 547 459 608


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Eleanor Dodson
It would be possible for the deposition sites to run a few simple tests 
to at least find cases where intensities are labelled as amplitudes or 
vice versa - the truncate plots of moments and cumulative intensities at 
least would show something was wrong.


Eleanor



Wladek Minor wrot

Dear All,

I just received information from Michael Gao (PDB) that updated sf file which is 
scheduled to be released on March 17, 2009 to replace the current incorrect sf file.


Wladek

Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616
http://krzys.med.virginia.edu/CrystUVa/wladek.htm

US-mail address:
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736

Fed-Ex address:
Department of Molecular Physiology and Biological Physics
1300 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908

  


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Gerard Bricogne
Dear Eleanor, 

 That is a useful suggestion, but in the case of 3ftt it would not have
helped: the amplitudes would have looked as healthy as can be (they were
calculated!), and it was the associated Sigmas that had absurd values, being
in fact phases in degrees. A sanity check on some (recalculated) I/sig(I)
statistics could have detected that something was fishy. 

 Looking forward to the archiving of the REAL data ... i.e. the images.
Using any other form of data is like having to eat out of someone else's
dirty plate!


 With best wishes,
 
  Gerard.

--
On Thu, Mar 12, 2009 at 09:22:26AM +, Eleanor Dodson wrote:
 It would be possible for the deposition sites to run a few simple tests to 
 at least find cases where intensities are labelled as amplitudes or vice 
 versa - the truncate plots of moments and cumulative intensities at least 
 would show something was wrong.

 Eleanor



-- 

 ===
 * *
 * Gerard Bricogne g...@globalphasing.com  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Wladek Minor



Eleanor,
Please note that so far CIF structure factors files were not sent by
deposition site for verification to authors. 
I spoke with Helen and John last Saturday and they promised me to change
that policy. Alternatively, one can check his/hers new deposit on every
Wednesday morning. 

Best regards
Wladek
At 05:22 AM 3/12/2009, Eleanor Dodson wrote:
It would be possible for the
deposition sites to run a few simple tests to at least find cases where
intensities are labelled as amplitudes or vice versa - the truncate plots
of moments and cumulative intensities at least would show something was
wrong.
Eleanor

Wladek Minor wrot
Dear All,
I just received information from Michael Gao (PDB) that updated sf file
which is scheduled to be released on March 17, 2009 to replace the
current incorrect sf file.
Wladek
Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616

http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address:
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address:
Department of Molecular Physiology and Biological Physics
1300 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908
 



Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616

http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address: 
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address: 
Department of Molecular Physiology and Biological Physics
1300 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908 



Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Wladek Minor



Dear Eleanor and Gerard,
PDB testing is performed on MTZ file and can not detect conversion
errors.

Wladek
At 10:03 AM 3/12/2009, Gerard Bricogne wrote:
Dear Eleanor, 
 That is a useful suggestion, but in the case of
3ftt it would not have
helped: the amplitudes would have looked as healthy as can be (they
were
calculated!), and it was the associated Sigmas that had absurd values,
being
in fact phases in degrees. A sanity check on some (recalculated)
I/sig(I)
statistics could have detected that something was fishy. 
 Looking forward to the archiving of the REAL
data ... i.e. the images.
Using any other form of data is like having to eat out of
someone else's
dirty plate!

 With best wishes,
 
 Gerard.
--
On Thu, Mar 12, 2009 at 09:22:26AM +, Eleanor Dodson wrote:
 It would be possible for the deposition sites to run a few simple
tests to 
 at least find cases where intensities are labelled as amplitudes or
vice 
 versa - the truncate plots of moments and cumulative intensities at
least 
 would show something was wrong.

 Eleanor


-- 

===

*
*
 * Gerard
Bricogne
g...@globalphasing.com *

*
*
 * Global Phasing
Ltd.
*
 * Sheraton House, Castle
Park Tel:
+44-(0)1223-353033 *
 * Cambridge CB3 0AX,
UK
Fax: +44-(0)1223-366889 *

*
*

===


Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616

http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address: 
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address: 
Department of Molecular Physiology and Biological Physics
1300 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908 



Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Frank von Delft

Gerard Bricogne wrote:

 Looking forward to the archiving of the REAL data ... i.e. the images.
Using any other form of data is like having to eat out of someone else's
dirty plate!
  
That may be so -- but if I'm hungry now, I just pop it in the sink -- I 
don't publish a call for tenders on an industrial-scale dish-washer, 
call up the architects and engineers to redesign the room, re-lay the 
plumbing, vamp up my electricity transformer and install a new drainage 
system.


Which doesn't mean the industrial-scale washer isn't necessary;  but 
honestly, can't we start by just washing the plate??


phx.


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Dale Tronrud
   This thread has evolved into two different topics.  Just to
clarify:

1)  There is a need for additional validation of structure factor
depositions.

   My recollection is that the output of SF Check is available to
the depositor via ADIT on the RCSB site.  I have found that report
to be quite helpful in checking for gross errors in my structure
factor files.

   The Electron Density Server performs similar checks.  It shows
that the R value for 3ftt is 6.4% with a correlation coefficient
between Fo and Fc of 0.996.

   The EDS flags entries as interesting if the calculated R value
is more than 5% higher than the reported R value.  Maybe it should
also note when the R value is more than 5% lower.

   The tools for validating structure factors exist but perhaps could
be put more in the face of the depositor to more strongly encourage
that they be looked at.

2) It would be useful to have a central repository of raw diffraction
   images.

   Most of the discussion on this point is the technical difficulty of
storing this quantity of data.  What has not been mentioned is the
much greater difficulty of validating these images.  You may think
the images for an entry have been deposited only to find out that
the investigator's wedding photos were accidentally deposited instead.

   Validating that the images correspond to the claimed structure
will be an enormous task;  probably more difficult than coming up
with enough hard drives to store them all.

Dale Tronrud

Frank von Delft wrote:
 Gerard Bricogne wrote:
  Looking forward to the archiving of the REAL data ... i.e. the
 images.
 Using any other form of data is like having to eat out of someone
 else's
 dirty plate!
   
 That may be so -- but if I'm hungry now, I just pop it in the sink -- I
 don't publish a call for tenders on an industrial-scale dish-washer,
 call up the architects and engineers to redesign the room, re-lay the
 plumbing, vamp up my electricity transformer and install a new drainage
 system.
 
 Which doesn't mean the industrial-scale washer isn't necessary;  but
 honestly, can't we start by just washing the plate??
 
 phx.


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Gerard Bricogne
Dear Frank,

 Thank you for your answer, as unimitable as ever. We do of course have
to wash one plate at a time when we each feel the pinch of hunger; but as we
do so we should not forget that the PDB is the Central Planning Office, and
that the order for that industrial-scale dishwasher has to be filed and
lobbied for if it is ever to be delivered. Otherwise we will continue
washing our own plates for ever.

 Besides, on the serious side of the argument, there are other benefits
to the deposition of images, as argued in the paper by Joosten et al. in the
recent Acta D volume containing the proceedings of last year's CCP4 Study
Weekend.


 With best wishes,
 
  Gerard.


--
On Thu, Mar 12, 2009 at 04:07:36PM +, Frank von Delft wrote:


 Gerard Bricogne wrote:
  Looking forward to the archiving of the REAL data ... i.e. the 
 images.
 Using any other form of data is like having to eat out of someone else's
 dirty plate!
   
 That may be so -- but if I'm hungry now, I just pop it in the sink -- I 
 don't publish a call for tenders on an industrial-scale dish-washer, call 
 up the architects and engineers to redesign the room, re-lay the plumbing, 
 vamp up my electricity transformer and install a new drainage system.

 Which doesn't mean the industrial-scale washer isn't necessary;  but 
 honestly, can't we start by just washing the plate??

 phx.


-- 

 ===
 * *
 * Gerard Bricogne g...@globalphasing.com  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Pavel Afonine

Hi Dale,


1)  There is a need for additional validation of structure factor
depositions.
  


PHENIX has tools for this:

1) phenix.cif_as_mtz will convert the PDB data file with diffrcation 
data into MTZ file. It automatically will figure out if the data are 
X-ray: Iobs or Fobs, or Neutron Fobs or Iobs.


2) The next step will be running phenix.model_vs_data

that will take the MTZ from from above and the corresponding PDB file 
and give a complete statistics, that you can immediately compare with 
the published value.


Note, phenix.model_vs_data can handle:

- twinned data;
- neutron data;
- all unknown ligands dictionaries are generated internally;
- PDB files with multiple models (with multiple MODEL records).

I run it every month or two, and so I have a nice list of interesting 
cases. The database of all converted to MTZ data files is used 
internally by PHENIX developers for various developments etc...


In fact, this was used in POLYGON validation tool: Acta Cryst. (2009). 
D65, 297-300


Pavel.


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Gerard Bricogne
Dear Dale,


On Thu, Mar 12, 2009 at 11:07:05AM -0700, Dale Tronrud wrote:
This thread has evolved into two different topics.  Just to
 clarify:
 
 1)  There is a need for additional validation of structure factor
 depositions.
 
My recollection is that the output of SF Check is available to
 the depositor via ADIT on the RCSB site.  I have found that report
 to be quite helpful in checking for gross errors in my structure
 factor files.
 
The Electron Density Server performs similar checks.  It shows
 that the R value for 3ftt is 6.4% with a correlation coefficient
 between Fo and Fc of 0.996.
 
The EDS flags entries as interesting if the calculated R value
 is more than 5% higher than the reported R value.  Maybe it should
 also note when the R value is more than 5% lower.
 
The tools for validating structure factors exist but perhaps could
 be put more in the face of the depositor to more strongly encourage
 that they be looked at.
 
 2) It would be useful to have a central repository of raw diffraction
images.
 
Most of the discussion on this point is the technical difficulty of
 storing this quantity of data.  What has not been mentioned is the
 much greater difficulty of validating these images.  You may think
 the images for an entry have been deposited only to find out that
 the investigator's wedding photos were accidentally deposited instead.
 

 My suggestion would be to give the images to (say) XDS: it would run
successfully on wedding photographs only in rare cases where the group
photograph was taken from a helicopter and the guests were arranged in very
peculiar ways ... .



Validating that the images correspond to the claimed structure
 will be an enormous task;  probably more difficult than coming up
 with enough hard drives to store them all.
 

 Not necessarily, unless the crystalline specimen is very poor. In
ordinary cases, instead of comparing structure factor amplitudes or
intensities calculated from the deposited model to those in a file of
deposited values, one would run an integration program (or several of them)
on the images, check that cell parameters and space group agree, then run
TRUNCATE if amplitudes are desired, to get those observed values (up to some
re-indexing). For this to be possible automatically one would have to be
much stricter with the completeness and accuracy of the information in image
headers produced by various detectors, a step that I think many people would
welcome.


 With best wishes,
 
  Gerard.



 Dale Tronrud
 
 Frank von Delft wrote:
  Gerard Bricogne wrote:
   Looking forward to the archiving of the REAL data ... i.e. the
  images.
  Using any other form of data is like having to eat out of someone
  else's
  dirty plate!

  That may be so -- but if I'm hungry now, I just pop it in the sink -- I
  don't publish a call for tenders on an industrial-scale dish-washer,
  call up the architects and engineers to redesign the room, re-lay the
  plumbing, vamp up my electricity transformer and install a new drainage
  system.
  
  Which doesn't mean the industrial-scale washer isn't necessary;  but
  honestly, can't we start by just washing the plate??
  
  phx.

-- 

 ===
 * *
 * Gerard Bricogne g...@globalphasing.com  *
 * *
 * Global Phasing Ltd. *
 * Sheraton House, Castle Park Tel: +44-(0)1223-353033 *
 * Cambridge CB3 0AX, UK   Fax: +44-(0)1223-366889 *
 * *
 ===


Re: [ccp4bb] 3ftt and gremlins

2009-03-12 Thread Robbie Joosten
Dear Dale,

 1)  There is a need for additional validation of structure factor
 depositions.

My recollection is that the output of SF Check is available to
 the depositor via ADIT on the RCSB site.  I have found that report
 to be quite helpful in checking for gross errors in my structure
 factor files.

The Electron Density Server performs similar checks.  It shows
 that the R value for 3ftt is 6.4% with a correlation coefficient
 between Fo and Fc of 0.996.

The EDS flags entries as interesting if the calculated R value
 is more than 5% higher than the reported R value.  Maybe it should
 also note when the R value is more than 5% lower.

The tools for validating structure factors exist but perhaps could
 be put more in the face of the depositor to more strongly encourage
 that they be looked at.
Thank you for stressing this again. I had a look at the PDB_REDO results
for 3ftt. It uses a procedure similar to that of the EDS, so I wanted to
make sure 3ftt was rejected because R(-free) could not be reproduced.
However, 3ftt was not in PDB_REDO for a different reason: the R-free set
had no information content. That is, all reflection have the same status
flag.
There are frequent discussions in the CCP4BB about the (un)importance of
keeping the R-free set. It would certainly be nice if the PDB would also
warn about R-free set problems to the depositor.
Funny thing for the status flags of 3ftt: all reflections are marked 'x'
(i.e. unmeasured). We should have known that there was something wrong
with them.

Cheers,
Robbie Joosten


[ccp4bb] 3ftt and gremlins

2009-03-11 Thread Wladek Minor



Dear Michael,
As we already wrote to Helen and John, structure factors for our deposit
(PDB ID 3FTT and RCSB ID RCSB051033) were mis-processed. Currently,
what was pointed on CCP4BB, instead of experimental amplitudes and sigmas
PDB reports calculated amplitudes and phases. Most probably our deposited
mtz file was wrongly processed because of non-standard labels for some
data (F_set1 instead of F and SIGF_set1 instead of SIGF). Could you send
me as soon as possible information on processing of our data file and
correct our deposit? If it is necessary I will send you file with
structure factors.
I am including letter that I have sent to Helen and John on Saturday
morning:
Dear Helen and John,
Following our conversation in the morning, please look into the problem
with conversion of our MTZ data (for the 3ftt deposit) into structure
factor CIF files. Apparently, instead of Fobs and sigmas, the Fcalc and
phases are in the CIF file.
To avoid these problems in the future, I propose that the CIF file
containing structure factors is sent to authors for approval together
with other files. I hope that we can resolve that quickly, taking into
account the current discussion on the ccp4 bulletin board. 
I will copy you with discussion on ccp4 bulletin board initiated by
Gerard.
Best regards
Wladek 

Best regards,
Wladek

Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616

http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address: 
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address: 
Department of Molecular Physiology and Biological Physics
1300 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908 



Re: [ccp4bb] 3ftt and gremlins

2009-03-11 Thread Wladek Minor



Dear All,
I just received information from Michael Gao (PDB) that updated sf file
which is scheduled to be released on March 17, 2009 to replace the
current incorrect sf file.
Wladek

Dr. Wladek Minor
Professor of Molecular Physiology and Biological Physics
Phone: 434-243-6865
Fax: 434-982-1616

http://krzys.med.virginia.edu/CrystUVa/wladek.htm
US-mail address: 
Department of Molecular Physiology and Biological Physics
University of Virginia
PO Box 800736, Charlottesville, VA 22908-0736
Fed-Ex address: 
Department of Molecular Physiology and Biological Physics
1300 Jefferson Park Avenue
University of Virginia
Charlottesville, VA 22908