from:"Dale Tronrud"

Re: [ccp4bb] How to fix ARG planarity outliers?

2023-08-17 Thread Dale Tronrud


Hi,

   I would like to support Dr Yamashita's statement and emphasize that 
Arg side chains really do deviate from planarity from time to time, and 
it is possible that your side chain is an "outlier" because it really is 
non-planar. In the paper Yamashita mentioned ("Arginine off-kilter: 
guanidinium is not as planar as restraints denote.", Acta Cryst. 2020, 
D76, 1159-1166) we gave as one example Arg 242 in 2xfr in which the 
chi-5 angle is 22 deg from planarity and completely supported by its 
density.


   When you have an "outlier" in a model despite restraints you should 
not simply increase the weights but try to understand what is driving 
your model away from the restraint.  Something about your structure is 
fighting against the restraint and understanding what that is is the 
key.  Maybe there is an error in some nearby side chain creating a bad 
contact or maybe something else. Maybe it is reality peeking through. 
Simply increasing the weight only hides the issue and may actually 
create its own error in your model.  (The hardest errors to identify are 
those that are inappropriately consistent with the validation criteria!)


Dale Tronrud

On 8/17/2023 8:03 AM, Keitaro Yamashita wrote:

Hi,

I assume the planarity you mentioned is what is discussed in Moriarty
et al (2020) https://doi.org/10.1107/S2059798320013534

As shown there, it is known to have some deviation. In Refmac (or in
the CCP4 monomer library) it is restrained using a torsion angle with
5 degree sigma instead of the planarity restraint. You can tighten the
restraint using an even smaller sigma. The keyword could be:

restr tors include resi ARG name chi5 sigma 2.0

By the way, did you mean by "outliers" those from the PDB validation
report? I have had a feeling that their criterion is a bit too strict.
Could anyone from the PDB tell how the outlier of the ARG sidechain is
calculated?

Best regards,
Keitaro

On Thu, 17 Aug 2023 at 15:02, Vladyslav Yadrykhinsky
 wrote:


Hello,

I am refining a structure and I have ARG planarity outliers in the sidechain.
Could someone please tell me how should I set the planarity restraint in 
REFMAC5 (version 5.8.0403) to correct them?

Should I use: plane [value1] [value2] in the advanced settings? If so, what 
values 1 and 2 should be? My current weight restraints are 0.128

Please let me know if you need more information and I will appreciate any 
assistance on the matter.

Best regards,
Vladyslav



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Patenting ligand binding?

2023-07-28 Thread Dale Tronrud

   There are certainly a large number of bad patents being issued, 
particularly in the US.  The patent office here seems to have decided 
that it is easier (for them) to just approve most every application and 
let the courts filter out the bad ones, which requires years of work for 
lawyers and boatloads of money.  There is a serious need for reform of 
the process.


   On the other hand, my experience with the technology transfer people 
at the University of Oregon (when I used to work there) is that they, 
(and most universities) are very careful to ensure the proposed patent's 
utility and validity when making an application because the cost of 
filing is quite high relative to their budget.


   The patent application in the current discussion begins with a 
detailed description of techniques that we all have used for many years 
and are obvious to us.  This description, however, is just the 
introduction to the patent and not a listing of the methods being 
patented.  The first section just gives the context in which the methods 
should be considered.


   On page 7 there is the section "DETAILED DESCRIPTION OF THE 
INVENTION" and that description is "A method for soaking ligands into 
macromolecule (e.g. protein) crystals (e.g. microcrystals) on EM (e.g. 
TEM) grids is presented. One or more crystals on the grid are soaked 
simultaneously using standard cryo-EM vitrification equipment."  So, the 
only claim of novelty is for soaking in the ligand in a specific way 
after the crystals are placed on the grid.  I'm not qualified to say if 
this is actually novel and unobvious, but the application seems to me to 
be very narrow and specific and NOT a blanket claim of performing 
structural biology using electron scattering.


Dale Tronrud

On 7/28/2023 12:45 AM, Winter, Graeme (DLSLtd,RAL,LSCI) wrote:

Interesting

https://www.freepatentsonline.com/20230228695.pdf 
<https://www.freepatentsonline.com/20230228695.pdf>


Patent for use of electron diffraction to assess ligand binding

Stumbled across this because the patent application cites my work - felt 
that this would be of interest to the community


… discuss?

Graeme

--

This e-mail and any attachments may contain confidential, copyright and 
or privileged material, and are for the use of the intended addressee 
only. If you are not the intended addressee or an authorised recipient 
of the addressee please notify us of receipt by returning the e-mail and 
do not use, copy, retain, distribute or disclose the information in or 
attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual 
and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
attachments are free from viruses and we cannot accept liability for any 
damage which you may sustain as a result of software viruses which may 
be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in 
England and Wales with its registered office at Diamond House, Harwell 
Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Structure prediction - waiting to happen

2023-04-01 Thread Dale Tronrud


Hi,

   Just ask ChatGPT to write it for you!

Dale Tronrud



On 4/1/2023 5:06 AM, Subramanian, Ramaswamy wrote:

Dear All,

I am unsure if all other groups will get it - but I am sure this group 
will understand the frustration.


My NIH grant did not get funded.  A few genuine comments - they make 
excellent sense.  We will fix that.


One major comment is, “Structures can be predicted by alpfafold and 
other software accurately, so the effort put on the grant to get 
structures by X-ray crystallography/cryo-EM is not justified.”


The problem is when a company with billions of $$s develops a method and 
blasts it everywhere - the message is so pervasive…


*Question: I*s there a canned consensus paragraph that one can add with 
references to grants with structural biology (especially if the review 
group is not a structural biology group) to say why the most modern 
structure prediction programs are not a substitute for structural work?


Thanks.


Rams
subra...@purdue.edu






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] To Trim or Not to To Trim

2023-03-18 Thread Dale Tronrud

I'm going to dive back in here again to expand this discussion.
Whether this diversion clarifies or obscures issues surrounding the
"crystallographers' dilemma" I'll leave for others to decide.

There is currently considerable discussion, among people who care
about cell phone cameras, over the behavior of the cameras in some
Samsung cell phones when photos are taken that include the Moon.

https://www.reddit.com/r/Android/comments/11nzrb0/samsung_space_zoom_moon_shots_are_fake_and_here/

In this post, evidence is presented that photos taken with one of
these phone cameras, and includes the Earth's largest satellite, the
image shows a properly exposed and detailed image of the orb. This
despite the fact that the test photograph described in the post is of a
tableau containing a deliberately blurred photo of the cratered globe.
The claim is made that the Samsung app is adding information from
sources other than the camera's light sensor and therefore the image is
"fake".

I expect that Samsung would reply that, once the app is confident
that the silver disk in the image is the nighttime traditional symbol of
romance, it is perfectly reasonable to make that, now identified aerial,
phenomenon appear in the image as expected by every sighted human in the
history of our species. There have been billions of high quality photos
of the silicate sphere taken. (This is both literally true and a gross
underestimate.) How can the photo be fake if it better reflects what
the photographer saw than what can be deduced from only the raw pixels
of the sensor?

Of course, this example differs only in degree from common practice
going back to the beginning of photography. Photos have always been
modified, sometimes in order to deceive the viewer, but most often to
make the photos more like what the photographer believed the scene
actually looked like. For example, in nearly every photo I take I
"correct" the color balance.

Is the photo with a detailed Moon fake? Are my photos taken at the
forest floor, but without everything being some shade of green, fake? I
think most people would be satisfied if there was a way for them to know
what sources of information were used in creating the image.

We, as scientists, are much more demanding of our PDB models. We
build better models when we use all the knowledge at our disposal. If
we are interpreting a 9A resolution map of hemoglobin and see a
disk-shaped piece of density where we know the heme goes, we are
perfectly justified to build an atomic model of heme. We are also
obliged to make clear that the exact atomic positions, bond lengths and
angles, were not derived from that map, just as a journalist needs to
make clear to the reader that their photo has been processed to include
detail which was not present when the image was "taken".

I have deposited models that contained features which were only
"consistent" with the electron density but supported by enough other
forms of evidence to make me confident in their existence. I have done
my best to make the justification of these models clear in the reports I
have written but continue to be frustrated by the lack of tools to
represent the precise interplay of data sources that support my model
WITHIN the deposition. I am not so naive to believe that everyone who
has cited my papers have actually read them.

Dale E. Tronrud

On 3/10/2023 1:05 AM, Julia Griese wrote:

Hi all,

My impression has been that the most common approach these days is to
“let the B-factors take care of it”, but I might be wrong. Maybe it’s
time to run another poll?

Personally, I call any other approach R-factor cosmetics. The goal in
model building is not to achieve the lowest possible R-factors, it’s to
build the most physically meaningful, most likely to be correct, model.
So if you know that the side chain is part of the protein, you should
model it the best way you can. If it’s there, just disordered, then the
most correct way to model it is to let it have high B-factors. Most
molecular graphics programs don’t flag zero-occupancy atoms, so the user
might never notice. Truncation of a side chain, unless there is evidence
that it really physically isn’t there, is also misleading, in my
opinion. I don’t believe that it is more helpful to the non-expert user
than high B-factors either.

If people who are not structural biologists themselves don’t know how to
use a structure, then we need to educate them better. It is very
straightforward these days to look at electron density in the PDB
viewer. It used to be difficult, but nowadays there’s no excuse for not
checking the electron density. The PDB validation flags RSRZ outliers.
You can easily colour a structure by B-factors. It doesn’t take that
much effort to teach students how to validate structures. The main point
you need to get across is that it is necessary to do so. And this needs
to be done

Re: [ccp4bb] To Trim or Not to To Trim

2023-03-10 Thread Dale Tronrud


Hi

   As a frequent contributor to prior discussions on this same topic I 
would like to broaden the discussion a bit.  I'm sorry to say that most 
of the comments on this threat are exactly the same positions that have 
been expressed many times over the years.  I don't want to spent time, 
again, retyping my opinions on how I prefer to torment the parameters of 
my models to express what I believe is going on inside my crystals.


   The fundamental problem is that the parameters we are forced to use, 
in our PDB depositions and in the refinement and model building programs 
available to us, are wholly inadequate.  We cannot accurately (or 
precisely) describe what we are are envisioning for the surface side 
chains, and sometimes entire stretches of main chain, in our proteins. 
We can continue to argue with each other year after year, but there is 
no solution to this problem other than changing the nature of PDB models 
and allowing a reasonable description of multi-conformation models.


   I believe it is fair to say that the consensus after a previous 
round of this discussion was that, at the very least, we need a flag for 
each atom which indicates whether that atom was placed based on electron 
density or simply to make a chemically complete set of atoms for that 
type of monomer.  I haven't looked but I think that was about five or 
ten years ago.  Since then the PDB has made major changes to the 
structure of PDB entries that will require most software for analysis of 
macromolecular models be rewritten and right now that organization is 
making a major push to get us to virtually attend a workshop to help us 
make this transition.  And yet I don't think there is anything in this 
new data dictionary to help us with this important but intractable 
problem.


   Unless the PDB gives us the parameters we need to properly describe 
a macromolecular model, and the refinement/model building developers 
give us the tools to make use of them, we will be back here again, every 
five years or so, rehashing this debate over exactly the same, 
irreconcilably poor, solutions to this problem.


Dale E. Tronrud


On 3/10/2023 1:05 AM, Julia Griese wrote:

Hi all,

My impression has been that the most common approach these days is to 
“let the B-factors take care of it”, but I might be wrong. Maybe it’s 
time to run another poll?


Personally, I call any other approach R-factor cosmetics. The goal in 
model building is not to achieve the lowest possible R-factors, it’s to 
build the most physically meaningful, most likely to be correct, model. 
So if you know that the side chain is part of the protein, you should 
model it the best way you can. If it’s there, just disordered, then the 
most correct way to model it is to let it have high B-factors. Most 
molecular graphics programs don’t flag zero-occupancy atoms, so the user 
might never notice. Truncation of a side chain, unless there is evidence 
that it really physically isn’t there, is also misleading, in my 
opinion. I don’t believe that it is more helpful to the non-expert user 
than high B-factors either.


If people who are not structural biologists themselves don’t know how to 
use a structure, then we need to educate them better. It is very 
straightforward these days to look at electron density in the PDB 
viewer. It used to be difficult, but nowadays there’s no excuse for not 
checking the electron density. The PDB validation flags RSRZ outliers. 
You can easily colour a structure by B-factors. It doesn’t take that 
much effort to teach students how to validate structures. The main point 
you need to get across is that it is necessary to do so. And this needs 
to be done not only in courses aimed at prospective experimental 
structural biologists, of course, but whenever students use structures 
in any way.


This is just the opinion of someone who feels very strongly about 
teaching structure validation and rejoices when students’ reply to the 
question “What was the most important thing you learned today?” is: 
“Don’t blindly trust anything.”


Cheers

/Julia

--

Dr. Julia Griese

Associate Professor (Docent)

Principal Investigator

Department of Cell and Molecular Biology

Uppsala University

BMC, Box 596

SE-75124 Uppsala

Sweden

email: julia.gri...@icm.uu.se

phone: +46-(0)18-471 4982

http://www.icm.uu.se/structural-biology/griese-lab/ 



*From: *CCP4 bulletin board  on behalf of 
Bernhard Lechtenberg <968307750321-dmarc-requ...@jiscmail.ac.uk>

*Reply-To: *Bernhard Lechtenberg 
*Date: *Friday, March 10, 2023 at 05:07
*To: *"CCP4BB@JISCMAIL.AC.UK" 
*Subject: *Re: [ccp4bb] To Trim or Not to To Trim

I found the poll I wrote about earlier. This actually is way older than 
I had expected (2011). You can see the poll results (which was run by Ed 
Pozharski) and discussion at the time here in the CCP4BB archive:


https://www.mail-archive.com/ccp4bb@jiscmail.ac.uk/msg20268.html

Re: [ccp4bb] outliers

2022-11-09 Thread Dale Tronrud

   And now it is time for an "old man story".  Back in the early 1990's 
the Brookhaven PDB started to worry about "validating" the models being 
deposited.  One of the things they implemented was to add to the header 
of the PDB a complete list of all bond lengths and angles that deviated 
from the library value by more than 3 sigma.


   In Brian Matthews' lab a student solved the structure of 
beta-galactosidase which is composed of over a thousand residues and the 
crystal has 16-fold ncs.  The model had over 130,000 atoms, a record for 
the time.  The PDB declared that this was one of the worst models they 
had ever seen because it had hundreds of geometry restraints violated by 
greater than 3 sigma.  The list in their header went on and on.


   Our response, of course, was that this model had over 130,000 bonds 
and 180,000 angles and if you assume a Normal distribution the number of 
3 sigma deviants were exactly the number expected - Which is what the 
geometry rmsds were saying.


Dale E. Tronrud

On 11/8/2022 3:25 PM, James Holton wrote:

Thank you Ian for your quick response!

I suppose what I'm really trying to do is put a p-value on the 
"geometry" of a given PDB file.  As in: what are the odds the deviations 
from ideality of this model are due to chance?


I am leaning toward the need to take all the deviations in the structure 
together as a set, but, as Joao just noted, that it just "feels wrong" 
to tolerate a 3-sigma deviate.  Even more wrong to tolerate 4 sigma, 5 
sigma. And 6 sigma deviates are really difficult to swallow unless your 
have trillions of data points.


To put it down in equations, is the p-value of a structure with 1000 
bonds in it with one 3-sigma deviate given by:


a)  p = 1-erf(3/sqrt(2))
or
b)  p = 1-erf(3/sqrt(2))**1000
or
c) something else?



On 11/8/2022 2:56 PM, Ian Tickle wrote:

Hi James

I don't think it's meaningful to ask whether the deviation of a single 
bond length (or anything else that's single) from its expected value 
is significant, since as you say there's always some finite 
probability that it occurred purely by chance.  Statistics can only 
meaningfully be applied to samples of a 'reasonable' size.  I know 
there are statistics designed for small samples but not for samples of 
size 1 !  It's more meaningful to talk about distributions.  For 
example if 1% of the sample contained deviations > 3 sigma when you 
expected there to be only 0.3 %, that is probably significant (but it 
still has a finite probability of occurring by chance), as would be 
finding no deviations > 3 sigma (for a reasonably large sample to 
avoid sampling errors).


Cheers

-- Ian


On Tue, Nov 8, 2022, 22:22 James Holton  wrote:

OK, so lets suppose there is this bond in your structure that is
stretched a bit.  Is that for real? Or just a random fluke?  Let's
say
for example its a CA-CB bond that is supposed to be 1.529 A long,
but in
your model its 1.579 A.  This is 0.05 A too long. Doesn't seem like
much, right? But the "sigma" given to such a bond in our geometry
libraries is 0.016 A.  These sigmas are typically derived from a
database of observed bonds of similar type found in highly accurate
structures, like small molecules. So, that makes this a 3-sigma
outlier.
Assuming the distribution of deviations is Gaussian, that's a pretty
unlikely thing to happen. You expect 3-sigma deviates to appear less
than 0.3% of the time.  So, is that significant?

But, then again, there are lots of other bonds in the structure. Lets
say there are 1000. With that many samplings from a Gaussian
distribution you generally expect to see a 3-sigma deviate at least
once.  That is, do an "experiment" where you pick 1000
Gaussian-random
numbers from a distribution with a standard deviation of 1.0.
Then, look
for the maximum over all 1000 trials. Is that one > 3 sigma? It
probably
is. If you do this "experiment" millions of times it turns out
seeing at
least one 3-sigma deviate in 1000 tries is very common. Specifically,
about 93% of the time. It is rare indeed to have every member of a
1000-deviate set all lie within 3 sigmas.  So, we have gone from one
3-sigma deviate being highly unlikely to being a virtual certainty if
you look at enough samples.

So, my question is: is a 3-sigma deviate significant?  Is it
significant
only if you have one bond in the structure?  What about angles?
What if
you have 500 bonds and 500 angles?  Do they count as 1000 deviates
together? Or separately?

I'm sure the more mathematically inclined out there will have some
intelligent answers for the rest of us, however, if you are not a
mathematician, how about a vote?  Is a 3-sigma bond length deviation
significant? Or not?

Looking forward to both kinds of responses,

-James Holton
MAD Scientist

Re: [ccp4bb] outliers

2022-11-08 Thread Dale Tronrud

   The second part of your question has to do with assessing the 
probability of correctness of a model by comparing the distribution of 
the individual values of geometry items with the distribution observed 
in large sets of high quality crystal structures.  Certainly, if your 
model has many more large deviants than expected from the observed 
distribution of deviants in quality models I would have doubts about it. 
 (I would also like to say that too few large deviants is a mark of 
shame too, but read on.)


   Actually, this is nothing more than comparing the rmsd bond lengths 
and rmsd bond angles with the rmsd's of the restraint library.  You are 
basically fitting a Normal distribution to both sets of observations and 
comparing their sigmas.  Remember when we used to do that, and still do 
implicitly when we publish these rmsd's in Table 1.


   What we have learned is that a model with rmsd's that are too large 
is certainly suspect, but people only rarely produce such models any 
more.  The real complication is that we, as a community, have decided 
based on other criteria that it is best for our models to have rmsd's 
for geometry that are much smaller than the rmsd's of our restraint 
libraries.


   The rmsd bond length of the quality models that I've seen tend to be 
around 0.02 A.  Looking in the PDB we tend to prefer 0.01 A and often 
less.  There are good reasons for this, based on the fact that low 
resolution data cannot define the correct values of the deviants and in 
that case we prefer to have deviants that are too small than deviants 
that have the correct magnitude distribution but are not related to the 
"real" deviants on a bond-by-bond basis.  (SigmaA weighting comes to 
mind as a similar solution to a similar problem.)


   If we assess the reliability of our models by looking to see if the 
distribution of deviants matches that of the library all of our models 
will be flagged as extremely unlikely.  Does that mean that matching the 
distributions will improve the model, as measured by the reliability of 
the individual or relative locations of the atoms?  I don't think so.


Dale E. Tronrud

On 11/8/2022 3:25 PM, James Holton wrote:

Thank you Ian for your quick response!

I suppose what I'm really trying to do is put a p-value on the 
"geometry" of a given PDB file.  As in: what are the odds the deviations 
from ideality of this model are due to chance?


I am leaning toward the need to take all the deviations in the structure 
together as a set, but, as Joao just noted, that it just "feels wrong" 
to tolerate a 3-sigma deviate.  Even more wrong to tolerate 4 sigma, 5 
sigma. And 6 sigma deviates are really difficult to swallow unless your 
have trillions of data points.


To put it down in equations, is the p-value of a structure with 1000 
bonds in it with one 3-sigma deviate given by:


a)  p = 1-erf(3/sqrt(2))
or
b)  p = 1-erf(3/sqrt(2))**1000
or
c) something else?



On 11/8/2022 2:56 PM, Ian Tickle wrote:

Hi James

I don't think it's meaningful to ask whether the deviation of a single 
bond length (or anything else that's single) from its expected value 
is significant, since as you say there's always some finite 
probability that it occurred purely by chance.  Statistics can only 
meaningfully be applied to samples of a 'reasonable' size.  I know 
there are statistics designed for small samples but not for samples of 
size 1 !  It's more meaningful to talk about distributions.  For 
example if 1% of the sample contained deviations > 3 sigma when you 
expected there to be only 0.3 %, that is probably significant (but it 
still has a finite probability of occurring by chance), as would be 
finding no deviations > 3 sigma (for a reasonably large sample to 
avoid sampling errors).


Cheers

-- Ian


On Tue, Nov 8, 2022, 22:22 James Holton  wrote:

OK, so lets suppose there is this bond in your structure that is
stretched a bit.  Is that for real? Or just a random fluke?  Let's
say
for example its a CA-CB bond that is supposed to be 1.529 A long,
but in
your model its 1.579 A.  This is 0.05 A too long. Doesn't seem like
much, right? But the "sigma" given to such a bond in our geometry
libraries is 0.016 A.  These sigmas are typically derived from a
database of observed bonds of similar type found in highly accurate
structures, like small molecules. So, that makes this a 3-sigma
outlier.
Assuming the distribution of deviations is Gaussian, that's a pretty
unlikely thing to happen. You expect 3-sigma deviates to appear less
than 0.3% of the time.  So, is that significant?

But, then again, there are lots of other bonds in the structure. Lets
say there are 1000. With that many samplings from a Gaussian
distribution you generally expect to see a 3-sigma deviate at least
once.  That is, do an "experiment" where you pick 1000
Gaussian-random
numbers from a distribution with a standard deviation

Re: [ccp4bb] outliers

2022-11-08 Thread Dale Tronrud

   Let's say you have decided that you want to know if the CA-CB bond 
of residue 123 in your favorite protein differs from the expected value 
for that type of bond.  You solve the structure and refine a model 
against your crystallographic data, then look at residue's 123 CA-CB 
bond and find that it is 3 sigma from the expected value.  Is this 
observation unlikely given the uncertainties in the parameters of the model?


   Now, let's look at a different case.  You have solved and refined a 
model of your favorite protein.  After examining all of 1000 bond 
lengths in your model you notice that the CA-CB bond of residue 123 is 3 
sigma from its expected value.  Is this observation unlikely given the 
uncertainties in the parameters of the model?


   Even though you are looking at the same bond in the same model and 
see exactly the same thing, the calculation of the probability that this 
bond is actually different than is usual it very different.  The 
calculation that you want to perform - the classic p test based on a 
Normal distribution - is valid for the first case but is quite 
inappropriate for the second.


   It is clearly much more likely that, among 1000 bonds, one of them 
will have a deviation of 3 sigma.  In fact I would say it is a near 
certainty.


   This twist of statistical analysis was never discussed in the basic 
classes on stats that I took and most scientists tend to ignore it.  To 
avoid the apparent paradox that you are confronting you have to include 
in your calculations the consequences of the actual question you have asked.


   There are huge problems with calculating this sort of "significance" 
because it is quite tempting to change your question after the fact and 
conclude that something is significant when it is not.  TNT always 
produced a list of the geometry outliers after refinement.  If you 
notice that a residue in the active site is present in that list, you 
will be tempted to forget that this residue was brought to your 
attention by a search over all geometry restraints and not a prior 
interest in the active site.


   This is a problem that many other fields of research are contending 
with.  One solution is to publish the questions you hope your model will 
answer before you perform the research.  That is certainly difficult 
with our sort of research.


   An example from another area might be helpful.  A researcher 
performs a survey of a lot of people asking questions about their diet 
and about their medical history.  Very often the published conclusion 
will be that, say, dietary item number 5 is correlated with medical 
condition number 12.  These studies tend to assess the significance of 
this result by just comparing the odds of these two items having the 
observed magnitude of correlation.


   This ignores the fact that a host of correlations were calculated 
and only this one was "significant".  If the survey had 20 dietary 
factors and 20 conditions then 400 comparisons were made and it was a 
virtual certainty that one of them would be "significant" unless the 
proper correction made to the probability calculations.


Dale E. Tronrud

On 11/8/2022 3:25 PM, James Holton wrote:

Thank you Ian for your quick response!

I suppose what I'm really trying to do is put a p-value on the 
"geometry" of a given PDB file.  As in: what are the odds the deviations 
from ideality of this model are due to chance?


I am leaning toward the need to take all the deviations in the structure 
together as a set, but, as Joao just noted, that it just "feels wrong" 
to tolerate a 3-sigma deviate.  Even more wrong to tolerate 4 sigma, 5 
sigma. And 6 sigma deviates are really difficult to swallow unless your 
have trillions of data points.


To put it down in equations, is the p-value of a structure with 1000 
bonds in it with one 3-sigma deviate given by:


a)  p = 1-erf(3/sqrt(2))
or
b)  p = 1-erf(3/sqrt(2))**1000
or
c) something else?



On 11/8/2022 2:56 PM, Ian Tickle wrote:

Hi James

I don't think it's meaningful to ask whether the deviation of a single 
bond length (or anything else that's single) from its expected value 
is significant, since as you say there's always some finite 
probability that it occurred purely by chance.  Statistics can only 
meaningfully be applied to samples of a 'reasonable' size.  I know 
there are statistics designed for small samples but not for samples of 
size 1 !  It's more meaningful to talk about distributions.  For 
example if 1% of the sample contained deviations > 3 sigma when you 
expected there to be only 0.3 %, that is probably significant (but it 
still has a finite probability of occurring by chance), as would be 
finding no deviations > 3 sigma (for a reasonably large sample to 
avoid sampling errors).


Cheers

-- Ian


On Tue, Nov 8, 2022, 22:22 James Holton  wrote:

OK, so lets suppose there is this bond in your structure that is
stretched a bit.  Is that for real? Or just a random fluke?  Let's

Re: [ccp4bb] Multiplicity is more than 20

2022-09-20 Thread Dale Tronrud


   Okay, I weigh in on the R factor issue...

   The free R that we calculate is only an estimate of the "true free 
R".  What we really want to know is what would the R value be if there 
there was no "bias" to the atomic model.  We make this estimate by 
pulling a subset of the data out of the refinement.  If you pull out a 
different subset you will get a somewhat different "free R estimate".


   If you run a lot of refinement with different subsets you will get a 
distribution of "free R estimates" and that distribution will have a 
standard deviation.  The smaller the number of reflections in the subset 
the larger that standard deviation will be.


   One shell of data will have only a few reflections so the standard 
deviation will be large and a particular instance may very well have a 
value for the "free R estimate" that is smaller than the working R, 
particularly when the overall R values are similar as in your refinement.


Dale Tronrud

On 9/19/2022 11:30 AM, Prasun Kumar wrote:

Hi All:

I have collected a dataset for a crystal of a 30 residues long helical 
peptide that makes a trimer in the solution. I also solved the structure 
to get a trimer. My issues start when I start preparing for a deposition.


Details about the data:

space group: I 21 3
Resolution: 1.6
Current Rfree/ Rwork: 0.21/0.19

Problems:
According to Aimless, Multiplicity: *33.9*, and I understand that the 
value should be less than or equal to 20. Does it mean that I have a lot 
of random noise or ice rings or something similar?

For the inner shell, R work is also *higher* than R free.

Please guide me in solving the above issue.

Thank You
Prasun




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Odd Positive Density Around a Cystine

2022-08-10 Thread Dale Tronrud

   A large B value with positive difference density sure implies a 
convergence problem with the refinement.  Was the B value extreme in 
your starting model?  (A starting B that is wildly too large or too 
small at the start may cause it to become trapped in the refinement.) 
Maybe if you rerun your refinement with a moderate starting value for 
the B you will end up a more sensible result.


   The only other way to end up with a parameter that directly 
conflicts with the difference density is a bad restraint, but that 
doesn't sound likely based on your description.


Dale Tronrud

On 8/10/2022 7:59 AM, Thomas, Leonard M. wrote:

Hello All,

I have run into something odd.  In working on a structure for one of the 
groups I work with regularly, on one of the cystine residues I have a 
very large positive density peak at the sulfur position. The B value is 
approximately 4 times the other values in the residue and on other 
cystine residues.  The overall structure has 2 molecules in the 
asymmetric unit  and the corresponding cystine  on the other monomer is 
behaving as I would expect.   There are no disulfides in the structure.


The data were collected on 9-2 at SSRL and all three of the data sets we 
collected show the same thing, all data go to about 2.2 angstroms.  We 
are trying to determine the ligand binding in the molecule but this 
cystine is not involved in ligand binding.  In house and other 
synchrotron data from previous protein preps and data collection runs of 
the same molecule grown in very similar condition and crystallized in 
the same space group have the residue behaving normally.


I am open to any ideas as to what may be going on as I am rather puzzled 
by this.


Thanks for any input,
Len Thomas

Leonard Thomas, Ph.D.
Biomolecular Structure Core, Director
Oklahoma COBRE in Structural Biology
Price Family Foundation Institute of Structural Biology
University of Oklahoma
Department of Chemistry and Biochemistry
101 Stephenson Parkway
Norman, OK 73019-5251
Office: (405)325-1126
lmtho...@ou.edu <mailto:lmtho...@ou.edu>
http://www.ou.edu/structuralbiology/cobre-core-facilities/mcl 
<http://www.ou.edu/structuralbiology/cobre-core-facilities/mcl>





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] AW: Unusual electron density at ASN residue

2022-07-22 Thread Dale Tronrud

I don't have a particular solution in mind, but wish to note that,
in my experience, something like this near a ASN or ASP residue is not
"unusual electron density". I have seen ugly density that looks like
this many times. I usually end up with partially occupied water
molecules, but am never happy with that solution.

Perhaps someone should do a survey of ASN residues in the PDB and
see what the very high resolution/high quality models say?

Dale Tronrud

On 7/22/2022 4:37 AM, Eleanor Dodson wrote:
It is hard to see from a still but there couldn't be another residue
there, could there? Tak the main chain through the ASN side chain, ASN
side chain uses some density - then fit residue X into the excess??

Eleanor

On Thu, 21 Jul 2022 at 12:44, Schreuder, Herman /DE
mailto:herman.schreu...@sanofi.com>> wrote:

I was also thinking of glycosylation, but as far as I know, E.coli
does not glycosylate protein.

Best,

Herman

__ __

*Von:*CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> *Im Auftrag von *Emmanuel Saridakis
*Gesendet:* Donnerstag, 21. Juli 2022 12:14
*An:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
*Betreff:* Re: [ccp4bb] Unusual electron density at ASN residue

__ __

Hi, have you checked for partial N-linked glycosylation? It's an
obvious site for this, although difficult to explain the density
being at both sides of the ASN (maybe disorder) ?

Emmanuel

*From: *"Nika Žibrat" mailto:nika.zib...@ki.si>>
*To: *"CCP4BB" mailto:CCP4BB@JISCMAIL.AC.UK>>
*Sent: *Wednesday, 20 July, 2022 16:58:49
*Subject: *Re: [ccp4bb] Unusual electron density at ASN residue

__ __

The crystallization condition is (NH_4 )_2 SO_4 _, glycerol and
2-Propanol. The distance between the residue the last visible
histidine (4 out of 6His) on the C-terminus is 12.5 Angstroms.

Thank you,

Nika

*From: *CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> on behalf of Jon Cooper
<488a26d62010-dmarc-requ...@jiscmail.ac.uk
<mailto:488a26d62010-dmarc-requ...@jiscmail.ac.uk>>
*Reply to: *Jon Cooper mailto:jon.b.coo...@protonmail.com>>
*Date: *Wednesday, 20 July 2022 at 15:36
*To: *"CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>"
mailto:CCP4BB@JISCMAIL.AC.UK>>
*Subject: *Re: [ccp4bb] Unusual electron density at ASN residue

Hello, what is the crystallisation condition and cryoprotectant? I
am thinking it might just be poorly defined waters. Also, roughly
how close is the Asn to the most C-terminal residue in the structure
in Angstroms?

Best wishes, Jon.C.

Sent from ProtonMail mobile

Original Message
On 20 Jul 2022, 13:39, Nika Žibrat < nika.zib...@ki.si
<mailto:nika.zib...@ki.si>> wrote:

Dear CCP4bb community,

I have a question on which I would greatly appreciate your
advice.

I have solved 25 kDa protein structure at 1.7 A resolution with
molecular replacement. The protein was expressed heterologously
in E. Coli BL21 (DE3). There is a large uninterrupted electron
density on the ASN residue at the C-terminus of my protein (see
two images below). The density is extended from both nitrogen
and oxygen atoms. The FOFCWT density on the oxygen dissapears
at 5.5 rmsd and the one on the nitrogen side dissapears at 6.3
rmsd.

On C-terminus there are 12 aminoacids after this particular Asn
residue, six of which is 6xHIS tag. There are two molecules in
the ASU but this density is present only in molecule B.

Any input on how to proceed regarding this density would be
appreciated.

Best regards,

Nika

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

__ __

To unsubscribe from the CCP4BB list, click the following l

Re: [ccp4bb] Validation of structure prediction

2021-12-20 Thread Dale Tronrud

   I don't see any reason to believe that software designed to validate 
crystallographic or NMR models would have any utility validating 
AlphaFold predicted models.  Doesn't the prediction software already 
ensure that all the indicators used by Molprobity are obeyed?  I'm 
afraid that the tools to validate any new technique must be designed 
specifically for that technique. (And when they become available they 
will be useless for validating crystallographic models!)


Dale E. Tronrud

On 12/20/2021 10:28 AM, Nicholas Clark wrote:
The Molprobity server can be run online and only requires the 
coordinates in PDB format: http://molprobity.biochem.duke.edu/ 
.


Best,

Nick Clark

On Mon, Dec 20, 2021 at 11:10 AM Reza Khayat > wrote:


Hi,


Can anyone suggest how to validate a predicted structure? Something
similar to wwPDB validation without the need for refinement
statistics. I realize this is a strange question given that the
geometry of the model is anticipated to be fine if the structure was
predicted by a server that minimizes the geometry to improve its
statistics. Nonetheless, the journal has asked me for such a report.
Thanks.


Best wishes,

Reza


Reza Khayat, PhD
Associate Professor
City College of New York
Department of Chemistry and Biochemistry
New York, NY 10031



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1




--
Nicholas D. Clark
PhD Candidate
Malkowski Lab
University at Buffalo
Department of Structural Biology
Jacob's School of Medicine & Biomedical Sciences
955 Main Street, RM 5130
Buffalo, NY 14203

Cell: 716-830-1908



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Analysis of NMR ensembles

2021-05-26 Thread Dale Tronrud

   You are much more knowledgeable than me about the details of 
structure determination via resonance spectroscopy.  I was attempting to 
come up with a toy example that showed that there are reasons for the 
absence of cross-peaks other than flexibility.  I accept that the 
misinterpretation I proposed would be difficult for a careful 
experimenter to make.


   I do get uncomfortable when I'm told that A implies B in a well 
executed experiment but I have no way of knowing if the model I'm 
looking at was constructed via a "well executed experiment".  Am I right 
that the PDB still has no validation slider for the quality of the fit 
of an NMR model to its data?


   I know, via existence proofs, that there are crystallographic models 
in the PDB that fit their data poorly or are simply unjustified by the 
experimental results.  Does this never happen with NMR?  How can I know 
that the model I'm looking at in the PDB was created using all the 
powerful techniques you describe, and that those techniques were 
correctly performed?


   Only with knowledge of those details can I say that the model I'm 
looking at is one of your "well-determined NMR ensembles" and I can 
trust that the variability in the ensemble reflects the structural 
variability.


Dale Tronrud

On 5/26/2021 2:38 PM, Mark J. van Raaij wrote:

Dear Dale,
Aren’t NMR spectroscopists, in contrast to us crystallographers, not in the 
lucky situation though that they should have noticed the absence of terminal 
residues during the assignment phase though? I.e. they would usually have peaks 
for the protons of those residues in the 1D, TOCSY, COSY spectra, even though 
NOEs may be absent.
I agree with you that other reasons than flexibility could cause absence of 
NOE’s, although I think that for well-determined NMR ensembles in almost all 
cases it is indeed flexibility / multiple conformations. If not enough 
restraints have been input, you might get artificial “flexible” regions, and 
obtaining more NOEs, secondary structure restraints, measuring orientational 
restraints should shore these up.
(Assuming that in the previous assignent phase all protons peaks could be 
properly assigned of course).
Mark

Mark J van Raaij
Dpto de Estructura de Macromoleculas
Centro Nacional de Biotecnologia - CSIC
calle Darwin 3
E-28049 Madrid, Spain
Section Editor Acta Crystallographica F
https://journals.iucr.org/f/



On 26 May 2021, at 23:06, Dale Tronrud  wrote:

Dear Boaz,

   We are likely in agreement. "Deficient NOE's for some regions (e.g. loops) arise 
from their flexibility, ..."  This makes it sound like you agree that these 
deficiencies in other regions may be caused by properties other than flexibility.

   As an extreme example, the N-terminal region of a protein may have a broad 
distribution in the ensemble model either because this region experiences many 
conformations in solution, or because this peptide was cleaved from the protein 
at some earlier time and its absence was not recognized by the experimentalist.

Dale Tronrud

On 5/26/2021 1:06 PM, Boaz Shaanan wrote:

Hi Dale and Cecil,
This is quite a circular argument, isn't it? Deficient NOE's for some regions 
(e.g. loops) arise from their flexibility, hence they are not as well resolved 
as other (e.g. internal ) regions for which the number of NOE is large. So they 
are flexible by all accounts and, not surprisingly, align usually with high 
B-factor regions in the corresponding crystal structures. In cases where such 
flexible regions are held by crystal contacts the situations would likely be 
different.
Cheers,
Boaz
/Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel
E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220
Fax:   972-8-647-2992 or 972-8-646-1710 /
//
//
/
/

*From:* CCP4 bulletin board  on behalf of Dale Tronrud 

*Sent:* Wednesday, May 26, 2021 10:46 PM
*To:* CCP4BB@JISCMAIL.AC.UK 
*Subject:* Re: [ccp4bb] Analysis of NMR ensembles
 I agree with Dr Breyton. The variability in an NMR ensemble does not
reflect "mobility" but simply "uncertainty" in conformation.  The spread
in coordinates in some regions simply reflects the lack of experimental
data which could define a single conformation.  There are many reasons
why these data are be absent and high mobility is only one.
Dale Tronrud
On 5/26/2021 8:45 AM, Cécile Breyton wrote:

Hello,
In my understanding of NMR, the loops and terminii that adopt very different 
conformations in the structure ensemble rather reflect the fact that for those 
residues, the number of constraints is lower, thus the number of structures 
that fulfil the constraints is larger A dynamics study of the protein will 
be much more informative.
Cécile
Le 26/05/2021 à 17:29, S. Mohanty a écrit :

Hi Harry,

The superpose/overlay of all the str

Re: [ccp4bb] Analysis of NMR ensembles

2021-05-26 Thread Dale Tronrud


Dear Boaz,

   We are likely in agreement. "Deficient NOE's for some regions (e.g. 
loops) arise from their flexibility, ..."  This makes it sound like you 
agree that these deficiencies in other regions may be caused by 
properties other than flexibility.


   As an extreme example, the N-terminal region of a protein may have a 
broad distribution in the ensemble model either because this region 
experiences many conformations in solution, or because this peptide was 
cleaved from the protein at some earlier time and its absence was not 
recognized by the experimentalist.


Dale Tronrud

On 5/26/2021 1:06 PM, Boaz Shaanan wrote:

Hi Dale and Cecil,

This is quite a circular argument, isn't it? Deficient NOE's for some 
regions (e.g. loops) arise from their flexibility, hence they are not as 
well resolved as other (e.g. internal ) regions for which the number of 
NOE is large. So they are flexible by all accounts and, not 
surprisingly, align usually with high B-factor regions in the 
corresponding crystal structures. In cases where such flexible regions 
are held by crystal contacts the situations would likely be different.


Cheers,

                Boaz


/Boaz Shaanan, Ph.D.
Dept. of Life Sciences
Ben-Gurion University of the Negev
Beer-Sheva 84105
Israel

E-mail: bshaa...@bgu.ac.il
Phone: 972-8-647-2220
Fax:   972-8-647-2992 or 972-8-646-1710 /
//
//
/

/

*From:* CCP4 bulletin board  on behalf of Dale 
Tronrud 

*Sent:* Wednesday, May 26, 2021 10:46 PM
*To:* CCP4BB@JISCMAIL.AC.UK 
*Subject:* Re: [ccp4bb] Analysis of NMR ensembles
     I agree with Dr Breyton. The variability in an NMR ensemble does not
reflect "mobility" but simply "uncertainty" in conformation.  The spread
in coordinates in some regions simply reflects the lack of experimental
data which could define a single conformation.  There are many reasons
why these data are be absent and high mobility is only one.

Dale Tronrud

On 5/26/2021 8:45 AM, Cécile Breyton wrote:

Hello,

In my understanding of NMR, the loops and terminii that adopt very 
different conformations in the structure ensemble rather reflect the 
fact that for those residues, the number of constraints is lower, thus 
the number of structures that fulfil the constraints is larger A 
dynamics study of the protein will be much more informative.


Cécile

Le 26/05/2021 à 17:29, S. Mohanty a écrit :

Hi Harry,

The superpose/overlay of all the structures in PyMol should inform you 
the rigid part of the protein as well as the flexible part. The rigid 
part would have very low backbone RMSD or overlay tightly and the 
flexible part (loops, N-term and C-term etc.) would not superpose 
tightly. If you check literature, the dynamics of the protein may have 
been studied through NMR relaxation.


Smita


On Wednesday, May 26, 2021, 10:05:05 AM CDT, Harry Powell - CCP4BB 
<193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:



Hi

Given that there are plenty of people on this BB who are structural 
biologists rather than “just” crystallographers, I thought someone 
here might be able to help.


If I have a structure in the PDB (e.g. 2kv5) that is an ensemble of 
structures that fit the NOEs, is there a tool available that will give 
me some idea about the bits of the structure that do not vary much 
(“rigid”) and the bits that are all over the place (“flexible”)?


Would superpose or gesamt be a good tool for this? Ideally I’d like 
something that could add a figure to the B columns in a PDB file so I 
could see something in QTMG (or PyMol if forced…) or do other useful 
things with the information.


Harry



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 

<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 

<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>>



This message was issued to members of www.jiscmail.ac.uk/CCP4BB 
<http://www.jiscmail.ac.uk/CCP4BB>, a
mailing list hosted by www.jiscmail.ac.uk <http://www.jiscmail.ac.uk>, terms & 
conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/ 

<https://www.jiscmail.ac.uk/policyandsecurity/>
<https://www.jiscmail.ac.uk/policyandsecurity/ 

<https://www.jiscmail.ac.uk/policyandsecurity/>>





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 

<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 

<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>>

Re: [ccp4bb] Analysis of NMR ensembles

2021-05-26 Thread Dale Tronrud

   I agree with Dr Breyton. The variability in an NMR ensemble does not 
reflect "mobility" but simply "uncertainty" in conformation.  The spread 
in coordinates in some regions simply reflects the lack of experimental 
data which could define a single conformation.  There are many reasons 
why these data are be absent and high mobility is only one.


Dale Tronrud

On 5/26/2021 8:45 AM, Cécile Breyton wrote:

Hello,

In my understanding of NMR, the loops and terminii that adopt very 
different conformations in the structure ensemble rather reflect the 
fact that for those residues, the number of constraints is lower, thus 
the number of structures that fulfil the constraints is larger A 
dynamics study of the protein will be much more informative.


Cécile

Le 26/05/2021 à 17:29, S. Mohanty a écrit :

Hi Harry,

The superpose/overlay of all the structures in PyMol should inform you 
the rigid part of the protein as well as the flexible part. The rigid 
part would have very low backbone RMSD or overlay tightly and the 
flexible part (loops, N-term and C-term etc.) would not superpose 
tightly. If you check literature, the dynamics of the protein may have 
been studied through NMR relaxation.


Smita


On Wednesday, May 26, 2021, 10:05:05 AM CDT, Harry Powell - CCP4BB 
<193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:



Hi

Given that there are plenty of people on this BB who are structural 
biologists rather than “just” crystallographers, I thought someone 
here might be able to help.


If I have a structure in the PDB (e.g. 2kv5) that is an ensemble of 
structures that fit the NOEs, is there a tool available that will give 
me some idea about the bits of the structure that do not vary much 
(“rigid”) and the bits that are all over the place (“flexible”)?


Would superpose or gesamt be a good tool for this? Ideally I’d like 
something that could add a figure to the B columns in a PDB file so I 
could see something in QTMG (or PyMol if forced…) or do other useful 
things with the information.


Harry



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>



This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/ 
<https://www.jiscmail.ac.uk/policyandsecurity/>





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>







To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Unmodeled density

2021-05-26 Thread Dale Tronrud

   Something to give context and scale would be helpful.  Two views 
would also be good.


Dale Tronrud

On 5/26/2021 6:08 AM, leo john wrote:

Hi Group
Can you please suggest what this unmodeled blob can be (see appended 
picture)?
I have Malonate, Boric Acid and Peg in my condition, and crystals were 
soaked in GOL.


I have tried fitting PO4 and SO4 so far.

Thank You
John
image.png





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Linux Distro for setting up workstations - Is CentOS still a good choice?

2021-02-19 Thread Dale Tronrud

   For what it's worth, CentOS 7 will continue to be supported to the 
middle of 2024.  That will give you time to see how all this shakes out.


https://wiki.centos.org/About/Product

Since Red Hat Linux is founded on open source software anyone can fork a 
new Linux distribution based on it, just like CentOS was.  There already 
is one project doing exactly that


https://www.zdnet.com/article/almalinux-the-centos-linux-replacement-beta-is-out/

   You can use the three years of life of CentOS 7 to watch what 
happens and make a choice then.


   While I certainly dabble in distros, for the systems I use the big, 
crystallographic, packages on I tend to follow the pack and use a Linux 
that is widely used.  I'll stick with 7 until the rest of you settle on 
something.  I'm in no hurry.


Dale Tronrud

On 2/19/2021 12:35 PM, Matthias Zeug wrote:

Hi all,

I just came across the (already quite old) news that Red-Hat switches 
their support-policy for CentOS to a rolling preview model (replacing 
CentOS Linux by CentOS Stream):


https://www.zdnet.com/article/why-red-hat-dumped-centos-for-centos-stream/

https://www.enterpriseai.news/2021/01/22/red-hats-disruption-of-centos-unleashes-storm-of-dissent/

I wondered if that has any implications for the community, as scientific 
programs - maybe except the big ones like coot, Phenix, and ccp4 - are 
often not **that** well maintained for an extended period. I had the 
impression CentOS was liked especially for its “unbreakability,”  and it 
seems to be the main developing platform for some widely used smaller 
programs (e.g., adxv).


Do you think it would be advisable to switch to a Ubuntu-distro when 
setting up new workstations in the future, or is it safe to stick to 
CentOS?


Please let me know what you think :-)

Best,

Matthias

Matthias Zeug

Buchmann Institute of Molecular Life Sciences

Goethe University Frankfurt




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Contagious, Self-Distributing "Vaccines?"

2021-02-17 Thread Dale Tronrud

   I think there is a great reluctance to release an "extremely 
contagious" virus into the world for any reason.  This is due to a 
combination of concern for informed consent, but more importantly, for 
fear of mutations causing a reversion to a dangerous form.  RNA viruses 
are particularly prone to mistakes when replicating their genome.


   The live form of the polio virus is the closest I know to the 
vaccination campaign you describe.  The vaccine is a version of polio 
virus with enough mutations to drop its replication rate to the point 
where it doesn't spread widely, so, in that point, it isn't a good model 
for your "extremely contagious" version.  This virus will replicate 
slowly in the gut for several months before immunity develops and the 
infection is eliminated.  This results in a very high level of 
protection for that person.


   A problem is that the virus is shed and can infect other people in 
the area who are not immune.  You could say "Great, more people 
vaccinated!".  Over the length of time of several successive infections 
reversion can occur and the disease can become severe enough to cause 
paralysis.


   This has a low probability but is avoided by ensuring that the live 
vaccine is only used when the surrounding population is well vaccinated, 
preventing repeated generations of infection. When there is a breakout 
of vaccine-derived polio the response is to sweep in and vaccinate as 
many people as possible.  Natural transmission cannot replace a 
vaccination program since the program is still required to clean up the 
mess when the live virus goes rogue.


   According to Wikipedia, in 2017 there were more cases of 
vaccine-related polio in the world than wild polio.  The live vaccine 
has not been used in the US for many years because the only cases of 
polio in the country were due to the vaccine.


100:  Your extremely contagious virus could never be recalled, and once 
it mutates, could only be overcome by an even more contagious vaccine 
virus. Goto 100


Dale Tronrud

On 2/17/2021 9:33 AM, Jacob Keller wrote:
It would seem to me that it should be possible to generate versions of 
the Covid virus that would:


A. be extremely contagious and yet
B. be clinically benign, and
C. confer immunity to the original covid virus.

If, then, this virus could be released, with appropriate "kill switch" 
safeguards built in, would this not solve the world's pandemic problems? 
Is there any reason, practically, why this approach would not be feasible?


Maybe we don't really know enough to manipulate A, B, C yet?

Or maybe it's too scary for primetime...nightmare bio-warfare apocalypse?

Has this sort of thing been done, or does it have a name?

Jacob
--

+

Jacob Pearson Keller

Assistant Professor

Department of Pharmacology and Molecular Therapeutics

Uniformed Services University

4301 Jones Bridge Road

Bethesda MD 20814

jacob.kel...@usuhs.edu <mailto:jacob.kel...@usuhs.edu>; 
jacobpkel...@gmail.com <mailto:jacobpkel...@gmail.com>


Cell: (301)592-7004

+




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] strange crystals

2021-02-16 Thread Dale Tronrud


On 2/16/2021 11:47 AM, Gerlind Sulzenbacher wrote:

Dear all,

I wonder whether any of you have ever encountered such strange crystals.

In the images you can download via the link below, crystal1 looks 
hollow, crystal2 looks like a cubic crystal embedded within a hexagonal 
arragement.


I don't know the cause of the overall appearance that your cubic 
crystals are embedded within larger cubic objects.  I do know that the 
projection of a cube, down its three-fold axis, to a plane will create a 
hexagon.  I think that is the case in your second photo.  The six-fold 
symmetry is actually the result of a special view of the same cubes you 
are seeing elsewhere.


Dale Tronrud



https://filesender.renater.fr/?s=download=71d70da4-5ea2-4cf3-ae5f-5d89545574ce 



May be all this is just an optical Fata Morgana.

We are waiting for our next synchrotron beam time to see whether, and 
what kind of diffraction they could provide.


In the meanwhile, I leave you with the beauty and may be somebody want's 
to comment.


with best wishes,

Gerlind





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] Space error while run NAMD simulation on DGX2

2021-02-02 Thread Dale Tronrud

   Remember, the program may be writing files to some partition other 
than the one your current working directly is located.  Maybe /tmp, for 
example.


Dale Tronrud

On 2/2/2021 12:36 PM, Amit Singh wrote:
Dear Abhilasha, It seems that the partition where you are running your 
simulation has no more space left. Kindly check the availability of 
space in the hard disk, especially in the working partition of the disk.


On Tue, 2 Feb 2021 at 9:29 PM, Abhilasha Thakur <mailto:thakur.08...@gmail.com>> wrote:


Hii..

I have downloaded the NAMD and VMD software, during my MD simulation
the first half is run smoothly (4/10 ns) but the process gets
stopped in between not only this but VMD also not working when I see
the MD simulation results. I have run this system in two different
systems, one is Windows; Dell, 32GB RAM icore 7  have version namd
2.14  with 10 core processor and second is DGX2 having 16 GPU having
Nvidia CUDA version NAMD 3.0 and protein total atoms is 5650. Please
suggest to me why this happens and what I can do to solve this problem.
I have attached a sample image of the error below.
Attachments area




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

--
with regards
Amit Singh



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] AW: [ccp4bb] Finding partial occupancy monomer by MR ?

2020-12-10 Thread Dale Tronrud

On 12/10/2020 6:46 AM, Schreuder, Herman /DE wrote:

Dear Phil,
0.32 is awfully close to 1/3, which brings a nice mathematical puzzle to my
mind to see if the 1/3 occupancy is somehow related to the 3 fully occupied
monomers... It may also be related to a (trigonal??) space group...

You probably have already tried it, but phaser has the option to give it
already solved molecules and ask it to search for additional molecules. Here I
would indeed lower the expected % homology significantly, to crudely compensate
for the low occupancy. In contrast to the advice of Dale, I would play around
with the % homology to find the value which works best.

It was not my intention to imply that one should not "explore" your
problem with multiple interpretations -- Only that you have to adjust
your assessment of the significance of the results of those tests.

For example, following MR, where you have varied some small multiple
of 6 parameters, you trust the working R factor. After you have
"explored" an great number of models by varying thousands of parameters
the working R now has to be judged by different criteria. If this
change is not made one will be misled to have more confidence in the
model than is truly justified. We all agree on this.

This problem, however, is not specific to the working R. If you
calculate five kinds of maps and pick the one with the tallest peak at
your favorite site, the "sigma" of that peak has to be considered less
significant than a peak of equal "sigma" in a map that you decided to
calculate before you collected data. (I'll leave my objections to the
idea of measuring peaks in "sigma"s for another day.)

When we go hunting in situations like Prof. Jeffrey's our criteria
are more squishy. Usually one varies a parameter or chooses a
alternative map type based on "interpretivity". All I'm saying is, if
you calculate twenty different maps and pick the one that is easiest to
interpret, you have to consider the significance of that increase in
ease of interpretation to be less than if you defined ahead of time the
map you planed to look at.

Yes, when your defined protocol fails, look around for alternatives.
Just ensure that your personal skepticism setting is cranked up when
doing so.

Dale Tronrud

My 2 cents,
Herman

-Ursprüngliche Nachricht-
Von: CCP4 bulletin board Im Auftrag von Phil Jeffrey
Gesendet: Donnerstag, 10. Dezember 2020 14:49
An: CCP4BB@JISCMAIL.AC.UK
Betreff: [ccp4bb] Finding partial occupancy monomer by MR ?

Preamble:
I have an interesting crystal form with 3 monomers (~400aa) at full occupancy and
apparently one at much reduced occupancy. It was built recently from Se-SAD and was in
moderately good condition: Rfree=32% for trimer, 2.6 Å. In recent refinement cycles it
became obvious that there was a 4th monomer in a region of weaker/choppy 2Fo-Fc and Fo-Fc
density that corresponded to a "confusing" set of low-occupancy SeMet sites
found by SHELXD and Phaser-EP. The experimental map was bad in that region and was
probably flattened during density modification anyway, in retrospect.

Question:
Phaser failed to find the 4th monomer after trivially finding the other
3 with a recent version of the monomer. I'm wondering if there's a way to indicate
"this one is partial occupancy" to Phaser, or if there's a way to improve the
odds of success beyond just lowering the expected % homology. Or if anyone has had
success with other programs. This is perhaps a rare edge case but I naively expected
Phaser to work.

In the end I used the weak SeMet sites to locate the monomer and the occupancy
appears to be around 0.32 in refinement.

Cheers,
Phil Jeffrey
Princeton

To unsubscribe from the CCP4BB list, click the following link:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1data=04%7C01%7CHerman.Schreuder%40SANOFI.COM%7C5ee2ee2e87f04d853f0408d89d126615%7Caca3c8d6aa714e1aa10e03572fc58c0b%7C0%7C1%7C63743205007789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=G28AUNQrAgQYblmaYBVnXESTXiekmWzfTLPMMX%2B%2BOgw%3Dreserved=0

This message was issued to members of
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BBdata=04%7C01%7CHerman.Schreuder%40SANOFI.COM%7C5ee2ee2e87f04d853f0408d89d126615%7Caca3c8d6aa714e1aa10e03572fc58c0b%7C0%7C1%7C63743205007789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=WitzjV%2F3hzx1SzMmmDzKVX56uVBD1fXluDDAcyY8Y1g%3Dreserved=0,
a mailing list hosted by
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2Fdata=04%7C01%7CHerman.Schreuder%40SANOFI.C

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-07 Thread Dale Tronrud

   We have drifted far from the original topic of this thread and if we 
continue I'll just make more of a fool of myself.


   I'll just go back to the original topic that I started with, that 
encoding connectivity information into an ID is not reliable or 
sustainable in a relational database.  I don't recall anyone in this 
long thread refuting this statement.


Dale Tronrud

On 12/5/2020 4:02 AM, Marcin Wojdyr wrote:

On Fri, 4 Dec 2020 at 22:36, Dale Tronrud  wrote:


 It is very important not to read more meaning into a data tag than
is actually defined in the mmCIF spec.  _atom_site.label_seq_id is defined

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.label_seq_id.html

as a pointer into the _entity_poly_seq table.


I think you inferred the relation between tables from textual
description. This should be avoided.
The relations are defined formally in the category
_pdbx_item_linked_group_list. In this case the relation between
atom_site and entity_poly_seq has three items that are to be matched
together:

_atom_site.label_comp_id = _entity_poly_seq.mon_id
_atom_site.label_entity_id = _entity_poly_seq.entity_id
_atom_site.label_seq_id = _entity_poly_seq.num


  It has to be an signed
integer (although I'm not clear on what a negative value for a pointer
means).


Do you interpret "pointer" as a row index? That's not how it's used. I
don't think that you can point to a position in the mmCIF table. In
general a "pointer" could be negative or not even numeric if the value
it points to is negative or not numeric. Although in this case
label_seq_id must be >=1, because that's the allowed range
(_item_range).


In that table there is a data item _entity_poly_seq.num,

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_entity_poly_seq.num.html

which is not a pointer, not an ID, but a name for that particular


(it's not a pointer because there is no such a thing as a pointer in
the mmCIF technology. Or is there?)


_entity_poly_seq row.  It must be a number that is unique and
sequential, and presumably indicates a "sequence number".  Note that the
rows in _entity_poly_seq can be listed in the loop_ in any order.


For the record, here is the description:
"The value of _entity_poly_seq.num must uniquely and sequentially
identify a record in the ENTITY_POLY_SEQ list.
Note that this item must be a number and that the sequence
numbers must progress in increasing numerical order."


 This means that the _atom_site.label_seq_id could be "3", pointing
to the third entry in _entity_poly_seq which happens to have its .num
equal to "1".


No. If _atom_site.label_seq_id is 3 it points to _entity_poly_seq.num that is 3.


You may not think that someone would choose to do this,
but if the first .num is -15 you can't avoid a mismatch.  In either case
the mmCIF is perfectly acceptable and the meaning is absolutely clear.


It's formally guaranteed to be >= 1. Although it's not guaranteed that
the sequence starts with 1, because mmCIF has no way to do this. And
it's not explicitly stated in the description. So you could argue that
the sequence numbers can start with 15.
Now, the intention of _atom_site.label_seq_id has always been that
it's the position wrt the full sequence (PDB people: correct me if I'm
wrong). This is how it's interpreted by the PDB people and software.
But no one thought to explicitly write that in addition to "increasing
numerical order" the numbers must start with 1. What to do in such a
case?

There are much better examples of lacking description.
For example, if you'd interpret anisotropic ADPs according to the
mmCIF description you'd get wrong values (for non-orthogonal systems).
Because the description was copied from the small-molecule spec which
uses different axes than PDB.
Or take auth_seq_id: originally it was used for sequence ID, then it
was changed to sequence number and the definition has not been
updated.
You could find many more examples. ( has been extensively
debated during PDBx/mmCIF WG meetings this year).

My take in all such cases is that it's better to interpret things how
they are used and how they were intended to be interpreted rather than
hold to the wording used in the specification. In an ideal world the
specification would be always correct and would cover all corner
cases. But in the meantime it's better to focus on getting things done
with what is available.


You also can't assume that the row with
_entity_poly_seq.num equal to "3" is chemically linked to the one with
.num equal to "2", much less the chemical nature of such a link.
_entity_poly_seq is not a data table that defines chemistry, only
"sequence".


OK, so how do you propose to find links between polymer residues?
The table with connections doesn't list peptide bonds in a protein
chain - they are implicit.

Marcin

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud

On 12/4/2020 12:15 PM, Marcin Wojdyr wrote:
> On Fri, 4 Dec 2020 at 19:16, Dale Tronrud  wrote:
>> learn the sequence you have to go to the mmCIF records that define the
>> connectivity between residues.  It is entirely possible that "3" comes
>> before "1" because these indexes don't contain any information, other
>> than being unique within the chain.
>
> In mmCIF you have label_seq_id that must be both unique and
> sequential. So 3 is always the third residue wrt to the full sequence.
>

   It is very important not to read more meaning into a data tag than 
is actually defined in the mmCIF spec.  _atom_site.label_seq_id is defined

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.label_seq_id.html

as a pointer into the _entity_poly_seq table.  It has to be an signed 
integer (although I'm not clear on what a negative value for a pointer 
means).  In that table there is a data item _entity_poly_seq.num,

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_entity_poly_seq.num.html

which is not a pointer, not an ID, but a name for that particular 
_entity_poly_seq row.  It must be a number that is unique and 
sequential, and presumably indicates a "sequence number".  Note that the 
rows in _entity_poly_seq can be listed in the loop_ in any order.  You 
can't assumed that the order they are listed in the mmCIF says anything 
about connectivity.  You get the order of the "things" in the sequence 
from _entity_poly_seq.num.

   This means that the _atom_site.label_seq_id could be "3", pointing 
to the third entry in _entity_poly_seq which happens to have its .num 
equal to "1".  You may not think that someone would choose to do this, 
but if the first .num is -15 you can't avoid a mismatch.  In either case 
the mmCIF is perfectly acceptable and the meaning is absolutely clear.

   Pulling up one of my favorite PDB entries I get

loop_
_entity_poly_seq.entity_id
_entity_poly_seq.num
_entity_poly_seq.mon_id
_entity_poly_seq.hetero
1 1   ILE n
1 2   THR n
1 3   GLY n
1 4   THR n
1 5   SER n
1 6   THR n
1 7   VAL n

These rows are listed in order of their .num item, and all the 
_atom_site.label_seq_id's will be equal to the _entity_poly_seq.num, but 
nothing in the spec forces that to be the case, and your software should 
not, ever, make that assumption.  Your software should also never assume 
that successive rows in _entity_poly_seq are chemically linked.  The 
order is arbitrary.  You also can't assume that the row with 
_entity_poly_seq.num equal to "3" is chemically linked to the one with 
.num equal to "2", much less the chemical nature of such a link. 
_entity_poly_seq is not a data table that defines chemistry, only 
"sequence".

   The whole point of a proper data base structure is that you don't 
assume anything!  All information has to be specifically encoded in the 
tables of the data base.  If your software makes use of a particular 
tag, you should go to the definition of that tag and use it, and not 
make additional extrapolations about it.

   I'm not saying the the data tag definitions of mmCIF are perfect, 
far from it.  But the foundation on CIF is sound and you have to stick 
with that formal structure, based in data base theory, if you are going 
to get the benefit of a proper data base.

   We have been used to the slap-dash world of PDB format for decades, 
where we try to make it work by stuffing extra characters on the end of 
the line or in a little gap that you have forgotten its real purpose. 
This has led to nothing but grief.  When I was writing my refinement 
program I can tell you that the most complex and difficult subroutine 
system was the one trying to read PDB files.  There were PDB files that 
had the number of electrons in the atom written in the occupancy column! 
 Some had the name of a calcium atom shifted to the left and some did 
not, making them indistinguishable from Calpha atoms.  The PDB format is 
an insane mess and is completely unworkable.  Please, let it die!

   The problem with Dr. Croll's suggestion "Using chain A as an 
example, perhaps the glycans could become Ag1, Ag2, etc.?" is that it 
loads connectivity information into names.  How can one write a standard 
database validation script to verify the correctness of this 
information?  You have defined a meaning to the characters in a "name" 
which is not defined in the data schema.  On the other hand, the data in 
the mmCIF, as currently defined is certainly complete enough that his 
software could generate names of this style for display to his users. 
His user interface is not limited by mmCIF in any way, and "value added" 
features like this might make his software even more successful.

   I certainly agree that the names chosen by the authors are of 
considerable value when examining their model

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud

   I agree that the user experience is very important, but that is not 
the purpose of a data base design.  The data scheme is designed for the 
storage and manipulation of data by software in a clear and unambiguous 
way. The presentation of the data to a user is the job of the 
application developer, such as yourself.  As anyone who has looked 
inside a mmCIF will tell you, it was not designed for human reading or 
editing.  Yes, it can be manually edited and read, and that is handy for 
people like you and I, but the average human protein modeler shouldn't 
be in there.


   According to Robbie, the information is present in the mmCIF to 
allow you to code a tool that will allow your users to navigate the 
model.  Maybe we can discuss off-line ideas for how this can be done.


   Anyway, I agree with you that representing glycans with one sugar 
differently than poly-glycans is not the best solution.  The PDB has 
shown little interest in my opinions on such matters in the past so I'm 
not getting involved in that argument.  I just jumped in to defend the 
adherence of mmCIF to formal data base theory, and suggest that the 
software developers reading mmCIF also stick to those rules, and not 
make unwarranted assumptions about the meaning of data items.


Dale Tronrud

On 12/4/2020 10:37 AM, Tristan Croll wrote:
OK, I understand your point more clearly now - but I'm not sure I fully 
agree, for the simple reason that people aren't computers. You're right 
that for the purposes of software validation tools the chain IDs are 
essentially arbitrary - as long as they're unique, nothing else really 
matters. But to a human simply wanting to /explore/ a model in their 
favourite visualisation program this makes everything just that bit less 
intuitive - if they want to, say, go to the first glycan attached to 
chain A they have no way of doing so short of tracing through from the 
N-terminus until they find it, unless the program provides a tool that 
already understands the concept of "first glycan attached to chain A". 
So if we go forward with the "chain IDs are entirely arbitrary, 
therefore it doesn't matter what they are" approach, then every existing 
visualisation tool gets a little bit more difficult to use with glycans 
until their authors take the time to write new task-specific code.


In the grand scheme of things it's a minor issue, I suppose - but in my 
opinion it really is important to keep the experience of the end user in 
mind when making decisions like this.

----
*From:* Dale Tronrud 
*Sent:* 04 December 2020 18:16
*To:* Tristan Croll ; CCP4BB@JISCMAIL.AC.UK 

*Subject:* Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at 
the PDB -- N-glycans are now separate chains if more than one residue


     Creating meaning in the chain names "A, B, C, Ag1, Ag2, Ag3" is
exactly the problem.  "chain names" ( or "entity identifiers" if I
recall the mmCIF terminology correctly) are simply database "indexes".
The values of indices are meaningless in themselves, they are just
unique values that can be used to unambiguously identify a record. In
principle, you could just assign random ISO characters (I don't think
mmCIF allows unicode) and the mmCIF would be considered identical.

     You are trying to force meaning to the characters with an index, and
that puts multiple types of information in a single field.  As Robbie
said already exists, if you want to encode connectivity into the data
base you have to add records that define that connectivity.  That places
the connectivity information explicitly in the data models and allows
standard data base tools to track and validate.

     The idioms of the PDB cause problems that lead people to these
mistakes.  The PDB assigns the indices "1", "2", and "3" to residues in
a chain.  A person could be misled into thinking that "2" comes between
"1" and "3" in the sequence.  This is not necessarily true at all.  To
learn the sequence you have to go to the mmCIF records that define the
connectivity between residues.  It is entirely possible that "3" comes
before "1" because these indexes don't contain any information, other
than being unique within the chain.

Dale Tronrud

On 12/4/2020 9:46 AM, Tristan Croll wrote:

 This suggestion violates a basic principle of data base theory.  A
 single data item cannot encode two pieces of information.

I'm sorry if I was unclear, but I don't believe I was suggesting 
anything of the sort. Hopefully this example should make it more clear - 
I'm just suggesting a slight variation on the existing system, no more:


If we start with model containing 3 protein chains A-C, with chain A 
containing amino acid residues 1-200, and 3 N-linked glycans with 
residues numbered, say, 1000-1005, 1020-1026 and 1040-1

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud

Creating meaning in the chain names "A, B, C, Ag1, Ag2, Ag3" is
exactly the problem. "chain names" ( or "entity identifiers" if I
recall the mmCIF terminology correctly) are simply database "indexes".
The values of indices are meaningless in themselves, they are just
unique values that can be used to unambiguously identify a record. In
principle, you could just assign random ISO characters (I don't think
mmCIF allows unicode) and the mmCIF would be considered identical.

You are trying to force meaning to the characters with an index, and
that puts multiple types of information in a single field. As Robbie
said already exists, if you want to encode connectivity into the data
base you have to add records that define that connectivity. That places
the connectivity information explicitly in the data models and allows
standard data base tools to track and validate.

The idioms of the PDB cause problems that lead people to these
mistakes. The PDB assigns the indices "1", "2", and "3" to residues in
a chain. A person could be misled into thinking that "2" comes between
"1" and "3" in the sequence. This is not necessarily true at all. To
learn the sequence you have to go to the mmCIF records that define the
connectivity between residues. It is entirely possible that "3" comes
before "1" because these indexes don't contain any information, other
than being unique within the chain.

Dale Tronrud

On 12/4/2020 9:46 AM, Tristan Croll wrote:

This suggestion violates a basic principle of data base theory. A
single data item cannot encode two pieces of information.

I'm sorry if I was unclear, but I don't believe I was suggesting
anything of the sort. Hopefully this example should make it more clear -
I'm just suggesting a slight variation on the existing system, no more:

If we start with model containing 3 protein chains A-C, with chain A
containing amino acid residues 1-200, and 3 N-linked glycans with
residues numbered, say, 1000-1005, 1020-1026 and 1040-1043 (a fairly
common approach I've seen taken to the problem in the past, and one I've
taken myself), then if I understand correctly after remediation we'll
have a model with protein chains A-C and glycan chains D-F. The problem
is, unless and until all the available visualisation software updates to
automatically associate chains D-F to chain A based on linkage, the user
just has to remember that chains D-F are actually the chain A glycans.
This is a simple case, but things quickly become far more messy when you
have multiple glycosylated species each with multiple glycans per chain.
If, instead, the new chain assignments were something like "A, B, C,
Ag1, Ag2, Ag3", then we have something that is far more immediately
accessible to the user.

*From:* Dale Tronrud
*Sent:* 04 December 2020 17:01
*To:* Tristan Croll ; CCP4BB@JISCMAIL.AC.UK

*Subject:* Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at
the PDB -- N-glycans are now separate chains if more than one residue

This suggestion violates a basic principle of data base theory. A
single data item cannot encode two pieces of information. The whole
structure of CIF falls apart if this is done.

Does the new PDB convention contain a CIF record of the link that
bridges between the protein chain and the, now separated, glycan chain?
If not, I think this is the principle failing of their new scheme.

Dale Tronrud

On 12/4/2020 12:06 AM, Tristan Croll wrote:
To go one step further: in large, heavily glycosylated multi-chain complexes the assignment of a random new chain ID to each glycan will lead to headaches for people building visualisations using existing viewers, because it loses the easy name-based association of glycan to parent protein chain. A suggestion: why not take full
advantage of the mmCIF capability for multi-character chain IDs, and
name them by appending characters to the parent chain ID? Using chain A
as an example, perhaps the glycans could become Ag1, Ag2, etc.?

On 4 Dec 2020, at 07:48, Luca Jovine wrote:

CC: pdb-l

Dear Zhijie and Robbie,

I agree with both of you that the new carbohydrate chain assignment convention
that has been recently adopted by PDB introduces confusion, not just for
PDB-REDO but also - and especially - for end users.

Could we kindly ask PDB to improve consistency by either assigning a separate chain to all covalently attached carbohydrates (regardless of whether one or more residues have been traced), or reverting to the old system (where N-/O-glycans inherited the same chain ID of the protein to which they are attached)? The current

hybrid solution hardly seems optimal...

Best regards,

Luca

On 3 Dec 2020, at 20:17, Robbie Joosten wrote:

Dear Zhijie,

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud



   This suggestion violates a basic principle of data base theory.  A 
single data item cannot encode two pieces of information.  The whole 
structure of CIF falls apart if this is done.


   Does the new PDB convention contain a CIF record of the link that 
bridges between the protein chain and the, now separated, glycan chain? 
 If not, I think this is the principle failing of their new scheme.


Dale Tronrud

On 12/4/2020 12:06 AM, Tristan Croll wrote:

To go one step further: in large, heavily glycosylated multi-chain complexes 
the assignment of a random new chain ID to each glycan will lead to headaches 
for people building visualisations using existing viewers, because it loses the 
easy name-based association of glycan to parent protein chain. A suggestion: 
why not take full advantage of the mmCIF capability for multi-character chain 
IDs, and name them by appending characters to the parent chain ID? Using chain 
A as an example, perhaps the glycans could become Ag1, Ag2, etc.?


On 4 Dec 2020, at 07:48, Luca Jovine  wrote:

CC: pdb-l

Dear Zhijie and Robbie,

I agree with both of you that the new carbohydrate chain assignment convention 
that has been recently adopted by PDB introduces confusion, not just for 
PDB-REDO but also - and especially - for end users.

Could we kindly ask PDB to improve consistency by either assigning a separate 
chain to all covalently attached carbohydrates (regardless of whether one or 
more residues have been traced), or reverting to the old system (where 
N-/O-glycans inherited the same chain ID of the protein to which they are 
attached)? The current hybrid solution hardly seems optimal...

Best regards,

Luca


On 3 Dec 2020, at 20:17, Robbie Joosten  wrote:

Dear Zhijie,

In generally I like the treatment of carbohydrates now as branched polymers. I 
didn't realise there was an exception. It makes sense for unlinked carbohydrate 
ligands, but not for N- or O-glycosylation sites as these might change during 
model building or, in my case, carbohydrate rebuilding in PDB-REDO powered by 
Coot. Thanks for pointing this out.

Cheers,
Robbie


-Original Message-
From: CCP4 bulletin board  On Behalf Of Zhijie Li
Sent: Thursday, December 3, 2020 19:52
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the
PDB -- N-glycans are now separate chains if more than one residue

Hi all,

I was confused when I saw mysterious new glycan chains emerging during
PDB deposition and spent quite some time trying to find out what was
wrong with my coordinates.  Then it occurred to me that a lot of recent
structures also had tens of N-glycan chains.  Finally I realized that this
phenomenon is a consequence of this PDB policy announced here in July.


For future depositors who might also get puzzled, let's put it in a short
sentence:  O- and N-glycans are now separate chains if it they contain more
than one residue; single residues remain with the protein chain.


https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.wwpdb.org%2Fdocumentation%2Fcarbohydrate-remediationdata=04%7C01%7Cluca.jovine%40KI.SE%7C1d790a0717ce4217c7a308d897c01b47%7Cbff7eef1cf4b4f32be3da1dda043c05d%7C0%7C1%7C637426199684263065%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=mBrkCJECFpZyCih4kOCcCvLT1GzQHxD5GD7bZDI9s1s%3Dreserved=0

"Oligosaccharide molecules are classified as a new entity type, branched,
assigned a unique chain ID (_atom_site.auth_asym_id) and a new mmCIF
category introduced to define the type of branching
(_pdbx_entity_branch.type) . "





I found the differential treatment of single-residue glycans and multi-residue
glycans not only bit lack of aesthetics but also misleading.  When a structure
contains both NAG-NAG... and single NAG on N-glycosylation sites, it might
be because of lack of density for building more residues, or because that
some of the glycosylation sites are now indeed single NAGs (endoH etc.)
while some others are not cleaved due to accessibility issues.Leaving NAGs
on the protein chain while assigning NAG-NAG... to a new chain, feels like
suggesting something about their true oligomeric state.


For example, for cryoEM structures, when one only builds a single NAG at a
site does not necessarily mean that the protein was treated by endoH. In
fact all sites are extended to at least tri-Man in most cases. Then why
keeping some sites associated with the protein chain while others kicked
out?

Zhijie





From: CCP4 bulletin board  on behalf of John
Berrisford 
Sent: Thursday, July 9, 2020 4:39 AM
To: CCP4BB@JISCMAIL.AC.UK 
Subject: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB


Dear CCP4BB

PDB data will shortly incorporate a new data representation for
carbohydrates in PDB entries and reference data that improves the
Findability and Interoperability of these molecules in macromolecular
structures

Re: [ccp4bb] phenix.refine with ligand with ambiguous electron density

2020-12-03 Thread Dale Tronrud

try new styles of map calculations - 
But first I calculate those maps for cases where I know the answer. 
I've refined a fair number of structures, probably not as many as most 
of you, but at the end of a refinement I take the answer and go back to 
the original maps.  Looking at those maps in light of the answer is what 
improves my map interpretation skills, such as they are, the most.


   All of my practice has been with ED (and some ESP) maps of better 
than 3 A resolution.  Despite all the intuition I can bring to bear on 
them, when it comes to a 4 A resolution map I'm no better than an 
undergrad.


   Your first experience with a new technique should never be with your 
current project's data.  You should work to add that technique to your 
tool box, and then move back to your data.  Practice, and more practice 
will build that squishy neural network in your head.


Descending from soapbox,
Dale Tronrud


On 12/1/2020 8:31 AM, Robert Nicholls wrote:

Dear all,

I feel the need to respond following last week’s critique of the use of 
Coot’s map blurring tool for providing diagnostic insight and aiding 
ligand identification…


On 24 Nov 2020, at 16:02, Dale Tronrud <mailto:de...@daletronrud.com>> wrote:


To me, this sounds like a very dangerous way to use this tool decide 
if a ligand has bound.  I would be very reluctant to modify my map 
with a range of arbitrary parameters until it looked like what I 
wanted to see.  The sharpening and blurring of this tool is not guided 
or limited by theory or data.


I disagree with this, subject to the important qualification that care 
is needed with interpretation. Blurring isn't a crime - it merely 
involves adjusting the weighting given to lower versus higher resolution 
reflections, and thus allows relaxation of the choice of high-resolution 
limit, and facilitates local investigation of regions that exhibit a 
poor signal-to-noise ratio. This is particularly pertinent to liganded 
compounds, which are typically present with sub-unitary occupancies.


Coot's blurring merely involves convolution of the whole map with an 
isotropic 3D Gaussian, with a parameter (B-factor) to control the 
standard deviation of the Gaussian. This corresponds to reweighting the 
structure factors in order to give higher weight to lower-resolution 
reflections. This approach is guided by a very simple theory: higher 
resolution structure factors (SFs) are typically noisier, with a 
worse signal-to-noise ratio than lower resolution SFs (due to increased 
errors in both observed higher-resolution reflections and calculated 
phases). Consequently, increasing the blurring B-factor reduces the 
effect of the noisier higher-resolution SFs. This results in a map that 
should be more reliable, but at the expense of reduced structural detail 
due to artificially reducing the effective resolution.


It should be noted that this does assume that lower resolution 
reflections are more reliable than higher resolution ones. So, good 
low-resolution data quality and completeness is important.


Unfortunately, determination of an optimal B-factor parameter is not 
presently automated. Consequently, users are currently expected to trial 
different values in the Coot slider tool in order to maximise 
information and gain, for want of a better word, intuition. 
Furthermore, due to the spatially heterogeneous nature of atomic 
positional uncertainty in macromolecular complexes, it can be that 
different B-factor parameters are of optimal usefulness in 
different local regions of the map that exhibit 
different signal-to-noise ratios. Such issues are on-going areas of 
research.


The main problem is that interpretation is subjective. In difficult 
cases, it is necessary to obtain as much information and insight as 
possible in order to gain a good intuition. If you can't see a ligand in 
the "standard" maps, but you can see evidence for a ligand in 
blurred density (or difference density) maps of the various types, then 
it means that careful exploration of those avenues is required. 
Any "evidence" from viewing such maps and map types should serve to 
guide intuition, and should be digested along with all other 
available information. Such complementary maps should be seen as 
diagnostics to gain intuition, rather than something that can be used as 
an unequivocal argument for ligand binding.


Ultimately, the presence of significant density in a blurred map means 
that there is something substantial present. Or in a blurred difference 
density that there is something missing from the current model. This 
could be a missing ligand, or it could be a mismodelled region of 
the macromolecule, or it could be mismodelled solvent (in which 
case re-evaluating any solvent mask may be worthwhile). Ultimately it is 
down to the practitioner to explore all potential explanations for any 
such behaviour, in order to maximise intuition and convince 
themselves of the crystal's st

Re: [ccp4bb] phenix.refine with ligand with ambiguous electron density

2020-11-25 Thread Dale Tronrud


Dear Jon,

   I don't think we have any disagreement.  I just wanted to emphasize 
that you should have your plan thought out at the start.  You may have 
decided that you will compare your ligands shape with your map using a 
Real Space R Factor.  If you don't like that number the fault isn't in 
the RSR but in your density.


   Of course you could have decided ahead of time to use a correlation 
coefficient.  Or you could have planed to calculate both and defined 
some weighting scheme to use the two in making the decision.  The error 
comes in if you decision tree depends on how well the analysis justifies 
your desired outcome.


   Like your test set, how you make your decisions should be kept 
isolated from your actual analysis.  The power of the human mind to bend 
its choices toward a rewarding outcome, even unconsciously, is enormous.


   I was trying to stay away from the particulars of what that decision 
tree would look like.  That topic has been discussed many times on the 
BB.  I certainly agree with my good friend Blaine Moors that the Fo-Fo 
map is the gold standard for deciding if something bound and gives a map 
that is unbiased by modeling.  In addition, Fo-Fo maps between the 
crystals of varying occupancy, even very small changes in occupancy, are 
surprisingly informative.  They tend to be highly isomorphorus and 
provide direct information for deconvoluting multiple conformations 
which is vital in partial occupancy binding.


Dale Tronrud

P.S. Changing the contour level does not change the map. That is simply 
a representation issue due to the difficulty of presenting all the 
information in a map.  Sharpening or blurring a map makes a new map, and 
since the sharpening factor is a continuous number that dial wheel 
creates an infinite number of different maps.  If your only means of 
selecting which one is "best" is how well the map fits your ligand, that 
map can't be used to justify your interpretation.


This could be made rigorous by, for example, deciding on the factor by 
looking at the quality of the map in some uncontroversial region -- If 
you decide on the means of choosing that region ahead of time.


On 11/24/2020 8:20 PM, Jon Cooper wrote:
Hello Dale, the statistical rigour you describe is, of course, 
excellent, but in a learning environment, if someone gets a negative 
result, you have to go into overdrive to check that everything has been 
done correctly, since there is a fair chance that human error is the 
cause. It may be a terrible practice, but it would seem to be an 
important part of the process? Even as a relative newcomer to the field 
(well, since the mid-80's ;-) I have seen many people getting nothing in 
their initial difference maps, even if the ligand is there. Frequently 
it was just the contour level being too high and, depending on how far 
back you go, the solution varied from showing someone how to roll the 
mouse wheel in Coot to having the map recontoured at a computer centre 
200 miles away and posted back on a magnetic tape, which took about 10 
days - a timescale on which some people just gave up and did something 
else! I can't help thinking it would be a shame to robotically accept 
every negative result at face value, not least if you're doing something 
important like curing a pandemic. However, back to the original question 
which I think was whether polder map coefficients could be used as 
refinement targets and I think the answer to that one is probably 'no', 
at least in the X-ray field ;-)


Best wishes, Jon Cooper



 Original Message 
On 24 Nov 2020, 16:02, Dale Tronrud < de...@daletronrud.com> wrote:


Hi,

To me, this sounds like a very dangerous way to use this tool decide
if a ligand has bound. I would be very reluctant to modify my map with
a range of arbitrary parameters until it looked like what I wanted to
see. The sharpening and blurring of this tool is not guided or limited
by theory or data.

As you describe it, your choice of map is driven by its agreement
with your ligand, and the proper way to make this decision is the other
way around.

The original poster has the problem that their density does not have
the appearance they desire. They have chosen to run around trying to
find some way to modify the map to get a variant that does. This is a
terrible practice, since the final choice of map is being made in a
fashion that is dominated by bias.

I have no idea what sort of "structural characteristics" have
convinced this poster of the presence of their ligand despite the
absence of clear electron density. What other evidence does a
diffraction pattern give? The map is your best and only source of
information about your structure that you can get from the diffraction
pattern. (Mass spec and other experimental techniques could, of course,
be applied.)

I think we, as a community, cou

Re: [ccp4bb] phenix.refine with ligand with ambiguous electron density

2020-11-24 Thread Dale Tronrud


Hi,

   To me, this sounds like a very dangerous way to use this tool decide 
if a ligand has bound.  I would be very reluctant to modify my map with 
a range of arbitrary parameters until it looked like what I wanted to 
see.  The sharpening and blurring of this tool is not guided or limited 
by theory or data.


   As you describe it, your choice of map is driven by its agreement 
with your ligand, and the proper way to make this decision is the other 
way around.


   The original poster has the problem that their density does not have 
the appearance they desire.  They have chosen to run around trying to 
find some way to modify the map to get a variant that does.  This is a 
terrible practice, since the final choice of map is being made in a 
fashion that is dominated by bias.


   I have no idea what sort of "structural characteristics" have 
convinced this poster of the presence of their ligand despite the 
absence of clear electron density.  What other evidence does a 
diffraction pattern give?  The map is your best and only source of 
information about your structure that you can get from the diffraction 
pattern.  (Mass spec and other experimental techniques could, of course, 
be applied.)


   I think we, as a community, could learn a few things from the 
vaccine trial studies that are so much in the news now.  In a modern 
clinical trial, to avoid bias in the interpretation of the results, all 
of the statistical procedures are decided upon BEFORE the study is even 
began.  This protocol is written down and peer reviewed at the start. 
Then the study is performed and the protocol is followed exactly.  If 
the results don't pass the test, the treatment is not supported.  There 
is no hunting around, after the fact, for a "better" statistical measure 
until one is found that "works".


   This way of handling data analysis in clinical trials was adopted 
after the hard lesson was learned that many trails could be reproduced, 
their results were not.


   I would recommend that you decide what sort of map you think is the 
best at showing features of your active site, based on the resolution of 
your data set and other qualities of your project, before you calculate 
your first Fourier transform.  If you think a Polder map is the bee's 
knees then calculate a Polder map and live with it.  If you are 
convinced of the value of a FEM, or a Buster map, or a SA omit map, or 
whatever, calculate that map instead and live with it.


   If you have to calculate twenty different kinds of maps, with 
varying parameters in each, before you find the one that shows the 
density for your ligand; it probably didn't bind.


Dale Tronrud

On 11/24/2020 5:35 AM, John R Helliwell wrote:

Dear Nika,
A tool I am gaining experience with, but for a challenge like you 
describe, may help:-
  In Coot>Calculate you see “Blurring/Sharpening tool”. You are 
presented with a choice of electron density map (here you would select 
your Fo-Fc). There is then a slider tool, to the  left and to the right, 
and you can see the impact of negative or positive B factor on your map. 
Blurring, slide right, may assist your density continuity versus 
Sharpening, slide left, which may assist the detail of your map. The 
logic of the tool is that your diffraction data, and of the Fo-Fc 
differences, can be fine tuned, in or out.

Best wishes,
John

Emeritus Professor John R Helliwell DSc





On 24 Nov 2020, at 11:29, Nika Žibrat  wrote:



Hello,


I have a question about protein-ligand, of which ligand displays an 
ambiguous electron density. I am solving a structure of protein with 
ligand  which was obtained via soaking. Structural characteristics 
indicate the ligand is present however the electron density is quite 
vague and too small for the size of the whole ligand. I did a Polder 
map which showed much larger area of green density. After insertion of 
my ligand into the green density in Polder I ran phenix.refine and 
there is a lot of red on the spot where the ligand is which was to be 
expected. This leaves me wondering how, if even do I incorporate the 
polder map data into my refine input.



My question is, how do I continue refining and validating the 
structure in this case?



Thank you,


Nika Žibrat





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>






To unsubscribe from the CCP4BB list, click the following link:
h

Re: [ccp4bb] R free rising

2020-11-02 Thread Dale Tronrud

On 11/2/2020 2:26 AM, Nika Žibrat wrote:
> 
> Hello,
> 
>  
> 
> I am trying to solve an X-ray structure of a protein of which the
> structure is already known. My aim is to only seek for ligands (soaking)
> and interpret any conformational changes. Since I am using a model with
> 100% sequence identity from PDB I am not doing Autobuild after Molecular
> phasing and continue directly with phenix.refine according to
> reccomendations (10 rounds). In accordance with X-triage I am also using
> NCS default settings in the refinement.
> 
>  
> 
> This refinement produces solid R free and R work values around 0.29 and
> 0.22. The problem becomes when I want to manually edit the structure,
> correct the loops which are changed upon binding of the ligand, and
> correct any outliers. This results in R free slightly lower than R work.
> Upon refining, R work drops normally while R free rises significantly
> (for 0.2 -0.3). I have been trying to crack this for a few days with no
> success.
> 
>  
> 
> I read that slightly lower R free can be normal in such cases but
> nevertheless both R values should drop, and haven’t found anything about
> the big rise of this value after refinement. It feels like I am missing
> something, since this is my first time solving a structure. Any advice?

   This is not normal behavior at all.  Rwrk and Rfree will be roughly
equal only before you perform any refinement.  The R's you report before
your model building sound quite reasonable.  When you manually change
the model you will likely cause both to increase, but you would have to
perform massive changes to get them to equalize at some larger value.

   The only thing I can think of that would cause this is for your
second refinement to be working with a newly created test set.  It is
possible that somehow you have reset your R free flags?  In an MTZ the
full data set is divided into twenty subsets -- one is the test set
while the other nineteen are the working set.  When you ran Refmac the
second time could you have told it to use a different segment as the
test set?

Dale Tronrud

> 
>  
> 
> Thank you,
> 
> Nika
> 
>  
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

[ccp4bb] Refmac Ideal Geometry Library

2020-07-25 Thread Dale Tronrud

Hi,

   I'm seeking insight into some geometry outliers in my Refmac refined
model.  It would be nice to have confidence in the target values used by
Refmac.  Does Refmac use the library distributed by CCP4 in
lib/data/monomers, or do it have its own library squirreled away somewhere?

Dale Tronrud



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] electron density close Histidine side chain

2020-07-20 Thread Dale Tronrud

   If there is a covalent link, maybe sending a sample off to mass spec
would be a good idea.  That would remove some of the guesswork.

Dale Tronrud

On 7/20/2020 9:16 AM, samer halabi wrote:
> Hello all,
> I have few blobs in an MHC II structure I am working on, especially
> opposite to Histidine as in the accompanying screenshot, that I am
> confused about.
> 
> In the crystal conditions, I have Tris, Imidazole, Acetate, PEG and
> Glycerol.
> Whatever ligand I am fitting in I am getting a clash (overlap -1.029),
> which makes me think whether there is a covalent bond forming between
> Histidine and other molecule. Perhaps by oxidation.
> 
> I would greatly appreciate if you can advice me about it, whether there
> is some kind of ligand I can try to fit and if this is something that
> occurs in some structures.
> Thank you.
> Best regards,
> Samer
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] What refinement programs are fully Open Source?

2020-05-07 Thread Dale Tronrud

   Could you point out where in this document the license is described?

   While not particularly relevant to your question, TNT is a reasonable
example of what Ethan is talking about.  TNT was always "open source"
since all source code was distributed to every user.  It has never been
"free", however.  To get it one had to agree to the license from the
University of Oregon and, if a for-profit organization, pay money.

Dale Tronrud

On 5/7/2020 10:18 AM, Roversi, Pietro (Dr.) wrote:
> Thank you Ethan for taking the the time to answer and explain.
> Yes I am sure I have asked a vague and imprecise question.
> 
> Practically, I am going to point to xia2 for data processing:
> https://www.ccp4.ac.uk/newsletters/newsletter48/articles/Xia2/manual.html
> 
> and hope it is "Open Source enough" - without too much scrutiny on
> dependencies?
> 
> So, what about a refinement suite of programs that is "just as Open
> Source" as xia2 is for data processing?
> 
> Unless this second message of mine is making my re-drafted question
> worse than the original one .
> 
> with best wishes,
> 
> Pietro
> 
> Pietro Roversi
> 
> Lecturer (Teaching and Research) https://le.ac.uk/natural-sciences/
> 
> LISCB Wellcome Trust ISSF Fellow
> 
> <https://bit.ly/2I4Wm5Z>https://le.ac.uk/liscb/research-groups/pietro-roversi
> 
> 
> Leicester Institute of Structural and Chemical Biology
> Department of Molecular and Cell Biology, University of Leicester
> Henry Wellcome Building
> Lancaster Road, Leicester, LE1 7HB
> England, United Kingdom
> 
> Skype: roversipietro
> Mobile phone  +44 (0) 7927952047
> Tel. +44 (0)116 2297237
> 
> 
> 
> *From:* Ethan A Merritt 
> *Sent:* 07 May 2020 18:08
> *To:* Roversi, Pietro (Dr.) 
> *Cc:* CCP4BB@jiscmail.ac.uk 
> *Subject:* Re: [ccp4bb] What refinement programs are fully Open Source?
>  
> On Thursday, 7 May 2020 09:34:13 PDT Roversi, Pietro (Dr.) wrote:
>> Dear all,
>> 
>> we are in the editorial stages of a manuscript that I submitted to Wellcome 
>> Open Research for publication.
>> 
>> The journal/editor ask us to list fully Open Source alternatives to the 
>> pieces of software we used, for example for data processing and refinement.
>> 
>> What refinement programs are fully Open Source?
> 
> There are recurring battles and philosophical fractures over what exactly
> "open source" means, either in practice or aspirationally.
> You would do well to provide a definition before asking people for
> suggestions that meet your criteria. 
> 
> At one point the Open Source Foundation (OSF) claimed to have the authority
> to declare something was or was not "open source" and kept lists of
> approved code, but their definition was in conflict with guidelines from
> other places including funding agencies [*].  Also the OSF itself seems to
> have largely disappeared from view, so maybe that's a bad place to start.
> 
> There are at least two fracture lines in this battle.
> The one created by people who feel a need to distinguish between
> "free/libre code" and "open code",  and the one created by people
> whose main concern is "documentation and claims are not enough;
> I need to see the code actually used for the calculations reported in
> this work".
> Then there's the concern mostly of interest to corporate legal
> departments "can we use this in our commercial products".
> 
>     Ethan (coding veteran with scars from this battle)
> 
> 
> [*] it was also in conflict with the ordinary English language meaning
> of "open" and "source", which didn't help any.
> 
> 
>> 
>> Thanks!
>> 
>> Pietro
>> 
>> 
>> Pietro Roversi
>> 
>> Lecturer (Teaching and Research) https://le.ac.uk/natural-sciences/
>> 
>> LISCB Wellcome Trust ISSF Fellow
>> 
>> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2F2I4Wm5Zdata=02%7C01%7Cpr159%40leicester.ac.uk%7Cf8cc2fb23bb84707d7a708d7f2a96338%7Caebecd6a31d44b0195ce8274afe853d9%7C0%7C0%7C637244681673138009sdata=q7trhrormT%2FziGp11z5wJyroZ1uylcu9KvJVPLSIljg%3Dreserved=0>https://le.ac.uk/liscb/research-groups/pietro-roversi
>> 
>> 
>> Leicester Institute of Structural and Chemical Biology
>> Department of Molecular and Cell Biology, University of Leicester
>> Henry Wellcome Building
>> Lancaster Road, Leicester, LE1 7HB
>> England, United Kingdom
>> 
>> Skype: roversipietro
>> Mobile phone  +44 (0) 7927952047
>> Tel. +44 (0)116 22972

Re: [ccp4bb] neg density/high B on sidechains

2020-04-28 Thread Dale Tronrud

   I'm not aware that anything has changed since the last time this was
discussed on the BB.  Currently there is no ideal solution since even
the mmCIF PDB format does not allow a proper description of the
situation of atoms whose position cannot be placed due to the absence of
data.

   The "Ligand Validation workshop" produced recommendations (Structure,
2016, 24(4), 502-508) that included the addition of a flag to atomic
properties that indicates that the atom's position was generated w/o
experimental guidance. To my knowledge this has not be done.

   In the absence of this solution we can only repeat the, conflicting,
arguments in favor or against the various less than optimal solutions.
You can go back to the archives to find those.

Dale Tronrud

On 4/28/2020 8:37 AM, Thomas, Leonard M. wrote:
> Hello all,
> 
> This is one of those issues that seems to come up now and then.  I have
> been working on a structure that obviously has some radiation damage as
> indicted by negative density and/or high thermal parameters.  Since we
> know that residue X is in the sequence the sidechain should be there and
> is just flopping around or has been damaged/removed by radiation
> exposure.  My questions is what is the current thinking on presenting
> these residues for deposition.  Remove the side chain atoms, drop the
> occupancy to zero, just let them  behave as they want ie high B factors
> some negative density.  
> 
> Cheers,
> Len
> 
> Leonard Thomas
> lmtho...@ou.edu <mailto:lmtho...@ou.edu>
> 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] AW: [EXTERNAL] Re: [ccp4bb] Increase in R-factor following REFMAC

2020-04-08 Thread Dale Tronrud

   In addition, REFMAC has tightened your model's geometry which can
cause a slight increase in the working R.  A one percent increase in
while in the upper thirties is slight.

Dale Tronrud

On 4/8/2020 7:30 AM, Schreuder, Herman /DE wrote:
> I guess the molecular replacement model has never been refined against
> this data set, so there is no reason why the Rfree should be better or
> worse than the Rfactor. The difference is larger than I would expect,
> but it may just be a statistical fluke. That the Rfree goes up
> significantly during refinement is what I would expect and suggests that
> the model needs significant rebuilding and refinement. Similar, the
> Rfactor going slightly up may indicate that the model is moving out of a
> false local minimum.
> 
>  
> 
> I would just continue rebuilding and refinement and see what happens.
> 
>  
> 
> Best,
> 
> Herman
> 
>  
> 
> *Von:* CCP4 bulletin board  *Im Auftrag von
> *Eleanor Dodson
> *Gesendet:* Mittwoch, 8. April 2020 15:36
> *An:* CCP4BB@JISCMAIL.AC.UK
> *Betreff:* [EXTERNAL] Re: [ccp4bb] Increase in R-factor following REFMAC
> 
>  
> 
> *EXTERNAL : *Real sender is owner-ccp...@jiscmail.ac.uk
> <mailto:owner-ccp...@jiscmail.ac.uk>
> 
>  
> 
> Well - there is something funny about your first Rfree - it shouldnt be
> significantly lower than R?
> 
> I suspect there is some muddle over the assignment of Rfree - one used
> for Phaser and a different value for REFMAC?
> 
>  
> 
> And of course at low resolution especiall Rfactors are very sensitive to
> scaling procedures.. 
> 
> Eleanor
> 
>  
> 
> On Wed, 8 Apr 2020 at 14:08, Kyle Gregory
> <3632e92fcc15-dmarc-requ...@jiscmail.ac.uk
> <mailto:3632e92fcc15-dmarc-requ...@jiscmail.ac.uk>> wrote:
> 
> Hello,
> 
> I haven't seen this before but doing a round of refinment with
> REFMAC, after molecular replacement with phaser, my R factor and R
> free have increased?
> 
> Also is it weird that my Rfree is smaller than my R factor?
> 
> *Result:*
> 
>   
> 
> *Initial*
> 
>   
> 
> *Final*
> 
> *R factor *
> 
>   
> 
> 0.3621
> 
>   
> 
> 0.3761
> 
> *R free *
> 
>   
> 
> 0.3108
> 
>   
> 
> 0.4733
> 
> *Rms BondLength *
> 
>   
> 
> 0.0076
> 
>   
> 
> 0.0057
> 
> *Rms BondAngle *
> 
>   
> 
> 1.6504
> 
>   
> 
> 1.5193
> 
> *Rms ChirVolume *
> 
>   
> 
> 0.0600
> 
>   
> 
> 0.0526
> 
> My map was ok(ish) but there was indication of anistropy so I
> attempted to improve it by using STARANISO and re-ran phaser with
> the output, and this is the results I see following REFMAC.
> 
> Thanks,
> 
> Kyle
> 
>  
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1=DwMFaQ=Dbf9zoswcQ-CRvvI7VX5j3HvibIuT3ZiarcKl5qtMPo=HK-CY_tL8CLLA93vdywyu3qI70R4H8oHzZyRHMQu1AQ=HlwIywlnVCEYkSzvp_-_ccaKmZ8CA8xqZ-E4qEM1EQM=zZRt-jd3LFyRUJNTwo1f2rCLUWxOrh0NCUIksNNS4PI=>
> 
> 
>  
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1=DwMFaQ=Dbf9zoswcQ-CRvvI7VX5j3HvibIuT3ZiarcKl5qtMPo=HK-CY_tL8CLLA93vdywyu3qI70R4H8oHzZyRHMQu1AQ=HlwIywlnVCEYkSzvp_-_ccaKmZ8CA8xqZ-E4qEM1EQM=zZRt-jd3LFyRUJNTwo1f2rCLUWxOrh0NCUIksNNS4PI=>
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Average B factors with TLS

2020-04-07 Thread Dale Tronrud

   This topic has been discussing on the BB many times and a little
searching should give you some long-winded answers (some written by me).

   The short version.  If you refine a model with a common B factor for
all atoms, or keep a very narrow distribution of B's, you will end up
with an average B close to the Wilson B.  If you don't something is
seriously wrong.

   If you allow a distribution of B's in your model, that distribution
is skewed on the high side because B factors cannot go below zero but
there is no physical upper bound.  The average of those B's will always
be larger than the Wilson B.  How much larger depends on your refinement
method more than the properties of the crystal since it is determined by
how large a tail you allow your B distribution to have.

   You didn't say what your B factor model was when you achieved an
average value of 31 A^2.  This value seems tiny to me since it implies
that your intensities are falling off in resolution so slowly that you
surely should have been able to measure data to a higher resolution.  If
you decide to deposit this model you should look into why you have such
a low value.

   On the other hand, the average B of 157 A^2 seems quite reasonable
for a 3 A model (using modern resolution cutoff criteria).  It is higher
than your Wilson B, but that is expected.  In addition, as you note, the
uncertainty of a Wilson B is quite large in the absence of high
resolution data.

   Yes, this is the short version.  ;-)

Dale Tronrud

On 4/7/2020 5:16 AM, Nicholas Keep wrote:
> I am at the point of depositing a low resolution (3.15 A) structure
> refined with REFMAC.  The average B factors were 31 before I added the
> TLS contribution as required for deposition which raised them to 157-
> this is flagged as a problem with the deposition, although this did not
> stop submssion.  The estimated Wilson B factor is 80.5 (although that
> will be quite uncertain) so somewhere between these two extremes.
> 
> Is it only the relative B factors of the chains that is at all
> informative?  Should I report the rather low values without TLS
> contribution or the rather high ones in any "Table 1"?  Comments
> appreciated.
> 
> Thanks
> 
> Nick
> 

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] [3dem] Which resolution?

2020-03-08 Thread Dale Tronrud

   Just a note: James Holton said "true B-factor" not "true B-factors".
 I believe he was talking about the overall B not the individual B's.

Dale Tronrud

On 3/8/2020 3:25 PM, Rangana Warshamanage wrote:
> Sorry for not being clear enough.
> If B-factors at the end of refinement are the "true B-factors" then they
> represent a true property of data. They should be good enough to assess
> the model quality directly. This is what I meant by B factor validation.
> However, how far are the final B-factors similar to true B-factors is
> another question.
> 
> Rangana
> 
> 
> On Sun, Mar 8, 2020 at 7:06 PM Ethan A Merritt  <mailto:merr...@uw.edu>> wrote:
> 
> On Sunday, 8 March 2020 01:08:32 PDT Rangana Warshamanage wrote:
> > "The best estimate we have of the "true" B factor is the model B
> factors
> > we get at the end of refinement, once everything is converged,
> after we
> > have done all the building we can.  It is this "true B factor"
> that is a
> > property of the data, not the model, "
> >
> > If this is the case, why can't we use model B factors to validate our
> > structure? I know some people are skeptical about this approach
> because B
> > factors are refinable parameters.
> >
> > Rangana
> 
> It is not clear to me exactly what you are asking.
> 
> B factors _should_ be validated, precisely because they are refined
> parameters
> that are part of your model.   Where have you seen skepticism?
> 
> Maybe you thinking of the frequent question "should the averaged
> refined B
> equal the Wilson B reported by data processing?".  That discussion usual
> wanders off into explanations of why the Wilson B estimate is or is not
> reliable, what "average B" actually means, and so on.  For me the bottom
> line is that comparison of Bavg to the estimated Wilson B is an
> extremely
> weak validation test.  There are many better tests for model quality.
> 
>         Ethan
> 
> 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] A question of density

2020-03-05 Thread Dale Tronrud

On 3/5/2020 11:00 AM, Jessica Besaw wrote:
> Hello Matthias,
> 
> Excellent point. Most of the the ordered water are easily visible at 2
> rmsd. The central disordered (or partially occupied) water becomes
> visible only at 1.3 - 1.4 rmsd, and it is very visible at 1 rmsd (which
> I have displayed all of the maps). In your opinion, do you think this
> would be noise? 

   "Noise" is a word that I hate.  Usually it is just a label we put on
something that we want an excuse to ignore.  If you decide to ignore
something you should be honest and have a specific reason.

   Personally, I think it is sufficient reason to not place an atom when
you are unsure if an atom should be placed.  With the density you have
shown, it is hard to imagine a consistent model that contains a
partially occupied water molecule in this little peak.  That "water
molecule" would be inconsistent with full occupancy of the atom to the
left, but that atom already has the lowest B factor when refined at full
occupancy.  These two atoms are too close together to both be present in
the same unit cell.

   Without a reasonable atomic model in mind, you can't build a
reasonable model.  Building a model is not simply the task of placing
atoms in peaks.  The resulting structure has to make physical sense.
What hydrogen bonding network are you proposing to exist when this rare
conformation is present?

   If you don't have confidence in a model, don't build it.  You will be
left with this little difference map peak, and you will take a tiny hit
on your R values.  Them's the breaks!  Putting in an atom you don't
believe would give you those slightly smaller R values, but is that
honest?   It seems to me that building a model that doesn't make
chemical sense just to lower R values borders on deception.

   Whether that makes this peak "noise" is irrelevant.  Maybe in five or
ten years someone will come up with a new tool for building overlapping,
partially occupied, water networks and this peak will be explained.
Would that change if this peak is "noise" or not?  All we have now is a
peak that we don't have a good way to interpret.

Dale Tronrud

> 
> Jessica
> 
> On Wed, 4 Mar 2020 at 12:45, Barone, Matthias  <mailto:bar...@fmp-berlin.de>> wrote:
> 
> hey Jessica
> 
> a tip that might come up later on anyway: once you put every
> reasonable bit into the desity, what I like to to when facing such
> blobbs: I take a well defined water out to create a diff density at
> a position where I know it is real. Having a feeling of how much you
> have to contour the diff density at that point can give you a good
> feeling how much of noise is actually in your density right in
> between the waters..?
> 
> best, matthias
> 
> 
> Dr. Matthias Barone
> 
> AG Kuehne, Rational Drug Design
> 
> Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP)
> Robert-Rössle-Strasse 10
> 13125 Berlin
> 
> Germany
> Phone: +49 (0)30 94793-284
> 
> 
> *From:* CCP4 bulletin board  <mailto:CCP4BB@JISCMAIL.AC.UK>> on behalf of Jessica Besaw
> mailto:jbesaw1...@gmail.com>>
> *Sent:* Wednesday, March 4, 2020 6:42:34 PM
> *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
> *Subject:* Re: [ccp4bb] A question of density
>  
> Hey Nukri,
> 
> Here are the details: Rwork/Rfree = 0.21 / 0.23 for a 2 Angstrom
> structure
> 
> I absolutely agree with you on the refinement. I did previously do
> that, and I attached the picture. 
> 
> What is the BB?
> 
> Cheers!
> 
> Jessica 
> 
> 
> 
> 
> 
> 
> On Wed, 4 Mar 2020 at 12:20, Nukri Sanishvili  <mailto:sannu...@gmail.com>> wrote:
> 
> Hi Jessica,
> You do not say how well is the rest of the structure refined.
> First, you should refine the structure best you can, without
> placing anything in the unclear blob of your interest so to
> obtain the best possible phases and hopefully improve the blob
> density as well.
> Then you should let the BB see what that density looks like.
> Looking at only the list of possibilities has very little value
> without seeing the density itself.
> Best wishes,
> Nukri
> 
> On Wed, Mar 4, 2020 at 11:10 AM Jessica Besaw
> mailto:jbesaw1...@gmail.com>> wrote:
> 
> Hello friends, 
> 
> I have a "blob" of density in an active site of my protein. 
> 
> I am struggling to determine if I should place a water in
&g

Re: [ccp4bb] A question of density

2020-03-05 Thread Dale Tronrud

   Series termination is a problem when you leave out Fourier
coefficients that have significant amplitude.  Back in the old days when
we cut data aggressively it was something to worry about.  Now that most
everyone integrates down to very weak intensities it shouldn't be much
of a problem.

   Besides, as Glusker and Tureblood noted, a peak in a difference map
(the green blob in this discussion) cannot be caused by series
termination.  If there is series termination the shape of a difference
map peak can be affected, but not the presence of the peak itself.

Dale Tronrud

On 3/5/2020 12:20 PM, 0c2488af9525-dmarc-requ...@jiscmail.ac.uk wrote:
> Hello, not sure if anyone has mentioned series termination errors in the
> vicinity of electron dense atoms. The attached is from Glusker &
> Trueblood and might be of interest.
> 
> Jon Cooper
> 
> On 5 Mar 2020 19:00, Jessica Besaw  wrote:
> 
> Hello Matthias,
> 
> Excellent point. Most of the the ordered water are easily visible at
> 2 rmsd. The central disordered (or partially occupied) water becomes
> visible only at 1.3 - 1.4 rmsd, and it is very visible at 1 rmsd
> (which I have displayed all of the maps). In your opinion, do you
> think this would be noise? 
> 
> Jessica
> 
> On Wed, 4 Mar 2020 at 12:45, Barone, Matthias  <mailto:bar...@fmp-berlin.de>> wrote:
> 
> hey Jessica
> 
> a tip that might come up later on anyway: once you put every
> reasonable bit into the desity, what I like to to when facing
> such blobbs: I take a well defined water out to create a diff
> density at a position where I know it is real. Having a feeling
> of how much you have to contour the diff density at that point
> can give you a good feeling how much of noise is actually in
> your density right in between the waters..?
> 
> best, matthias
> 
> 
> Dr. Matthias Barone
> 
> AG Kuehne, Rational Drug Design
> 
> Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP)
> Robert-Rössle-Strasse 10
> 13125 Berlin
> 
> Germany
> Phone: +49 (0)30 94793-284
> 
> 
> 
> *From:* CCP4 bulletin board  <mailto:CCP4BB@JISCMAIL.AC.UK>> on behalf of Jessica Besaw
> mailto:jbesaw1...@gmail.com>>
> *Sent:* Wednesday, March 4, 2020 6:42:34 PM
> *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
> *Subject:* Re: [ccp4bb] A question of density
>  
> Hey Nukri,
> 
> Here are the details: Rwork/Rfree = 0.21 / 0.23 for a 2 Angstrom
> structure
> 
> I absolutely agree with you on the refinement. I did previously
> do that, and I attached the picture. 
> 
> What is the BB?
> 
> Cheers!
> 
> Jessica 
> 
> 
> 
> 
> 
> 
> On Wed, 4 Mar 2020 at 12:20, Nukri Sanishvili
> mailto:sannu...@gmail.com>> wrote:
> 
> Hi Jessica,
> You do not say how well is the rest of the structure refined.
> First, you should refine the structure best you can, without
> placing anything in the unclear blob of your interest so to
> obtain the best possible phases and hopefully improve the
> blob density as well.
> Then you should let the BB see what that density looks like.
> Looking at only the list of possibilities has very little
> value without seeing the density itself.
> Best wishes,
> Nukri
> 
> On Wed, Mar 4, 2020 at 11:10 AM Jessica Besaw
> mailto:jbesaw1...@gmail.com>> wrote:
> 
> Hello friends, 
> 
> I have a "blob" of density in an active site of my protein. 
> 
> I am struggling to determine if I should place a water
> in this spot, if I should model it as a disordered
> water, if the density may be a ligand that I have not
> considered, or if it should be left as unaccounted for
> density. I would like to publish this structure without
> compromising the science.
> 
> I have attached several possibilities that I have
> considered below. 
> 
> Any suggestions would be appreciated.
> 
> Cheers!
> 
> Jessica Besaw
> 
> 
> 
> 
>

Re: [ccp4bb] Hydrogens in PDB File

2020-03-02 Thread Dale Tronrud

On 3/2/2020 10:12 AM, Alexander Aleshin wrote:
> Dear Dale,
> You raised a very important issue that has been overly ignored by the 
> crystallographic community. The riding hydrogens are just a tip of an 
> iceberg. It is absolutely unclear even to an experienced crystallographer how 
> to treat poorly ordered side chains or even whole residues. As a matter of 
> fact,  their models are "riding atoms", and consumers have no clue how much 
> they can trust our modeling.

   Oh no!  Now I've opened up this can of worms.

   The matter of describing completely disordered side chains has been
discussed heavily on this BB, along with the advantages and shortcomings
of overloading the meaning of "B factor" or "Occupancy" to describe this
situation.

   Using one data item to describe multiple things is never a good idea,
in my opinion.  The move to mmCIF for model storage does open the
possibility of adding new tags to uniquely describe model properties.
Creating such a tag for "place-holder" side chain atoms was one of the
recommendations in "Outcome of the First wwPDB/CCDC/D3R Ligand
Validation Workshop" (https://www.ncbi.nlm.nih.gov/pubmed/27050687).  I
don't know the status of the implementation of any of these
recommendations.  The wheels of the wwPDB grind exceedingly slowly.

   This is just another part of the huge problem of describing the
nature of the deposited model and the origin of the information
supporting all of its parts.

1) Riding Hydrogen atoms vrs free-floating and refined
2) Placeholder side chains vrs visible in density
3) Placeholder loops vrs visible in density
4) TLS anisotropic B's vrs restrained individual atom aniso vrs
unrestrained individual - The options here are many and multiple types
may be present in one model
5) NCS restraint/constraint - The options here are many and multiple
types may be present in one model
6) Concerted alternative conformation spread over multiple residues
7) Sequence heterogeneity

These are just the topics that bother me with almost every model I
download.  I'm sure there are plenty of other topics that don't come to
mind at the moment.

   With the move to mmCIF we now have the opportunity to create
descriptions of these model properties w/o just adding more and more
REMARK cards.

   Until that wondrous day arrives we are stuck with trying to create
models that are useful to the general community and provide minimal
opportunity for confusion.  We can argue as to where that line is, but
shouldn't loose sight of the ultimate goal.

   Ethan and I disagree over the relative damage caused by the inclusion
of riding hydrogen atom positions vrs the confusion that results from
their absence.  I think we agree strongly that all of the list items
above need to be tackled by the wwPDB and are of extreme importance.  I
think we need a comprehensive solution, not a piecemeal, special case,
for each.

Dale Tronrud

> Moreover, some programs (including the version of Pymol that I use), get 
> confused when they detect residues with multiple conformations. Like my Pymol 
> version fails to build cartoon elements in those areas, and it is not obvious 
> for a beginner how to remove the alternative conformations. I presume many 
> consumers just ignore such structures and pick up analogues that are 
> displayed without problems.  
> 
> Pymol developers, is it so difficult to report a user, when a structure is 
> loaded that it has residues with alternative conformations, and one of 
> conformers should be hidden for a correct presentation of the secondary 
> structure elements?
> 
> Alex
> 
> 
> 
> On 3/2/20, 12:57 AM, "CCP4 bulletin board on behalf of Dale Tronrud" 
>  wrote:
> 
> [EXTERNAL EMAIL]
> 
> Dear Tim,
> 
>I am in agreement with Ethan and you that a complete description of
> the restraints and constraints applied to the model should be included
> in the deposition.  This is currently a major failing of the wwPDB.  For
> hydrogen atoms we, at least, have the "Riding hydrogen atoms were added"
> remark but that simple statement is inadequate to allow anyone (or
> program) to reproduce what the depositor had on disk before the hydrogen
> atoms were redacted.  We know that shelxl and MolProbity produce
> hydrogen models that differ, and that shelxl requires additional
> information about the temperature of the molecule at least.
> 
>How could someone hope to develop a better technique for generating
> hydrogen atom models if the results could never be deposited and used?
> 
>There is an additional matter of practical importance.  While the two
> of us share a lack of confidence in the care taken by some depositors in
> the creation of hydrogen atoms, I believe th

Re: [ccp4bb] Hydrogens in PDB File

2020-03-02 Thread Dale Tronrud

Dear Tim,

   I am in agreement with Ethan and you that a complete description of
the restraints and constraints applied to the model should be included
in the deposition.  This is currently a major failing of the wwPDB.  For
hydrogen atoms we, at least, have the "Riding hydrogen atoms were added"
remark but that simple statement is inadequate to allow anyone (or
program) to reproduce what the depositor had on disk before the hydrogen
atoms were redacted.  We know that shelxl and MolProbity produce
hydrogen models that differ, and that shelxl requires additional
information about the temperature of the molecule at least.

   How could someone hope to develop a better technique for generating
hydrogen atom models if the results could never be deposited and used?

   There is an additional matter of practical importance.  While the two
of us share a lack of confidence in the care taken by some depositors in
the creation of hydrogen atoms, I believe the PDB customers are even
worst.  If a crystallographer or microscopist should not be trusted to
add hydrogen atoms should we expect an undergrad or, maybe, a high
school student to do better?

   When someone downloads a model they expect they will be able to use
that model without performing a host of technical manipulations just to
be able to see where the depositor thought the atoms were located.  We
should certainly give them enough information to understand how those
atoms were placed (and we are failing at that), but anyone should be
able to fire up Coot, load a PDB and map, and make some sense of it.
Maybe someday Coot will be able to automatically generate hydrogen
atoms, but currently the files do not contain enough information for it
to do a reasonable job.

   If hydrogen atoms are to be deleted because they can, sort of, be
recalculated, there are other aspects of the PDB file that also could be
removed.  I think I could do a pretty good job of resurrecting deleted
CB atoms for any of the nineteen amino acids that contain them.  Should
we just drop all CB's and add a remark saying that their locations can
be inferred from the deposited atoms?

Dale Tronrud

P.S. I realize that I am open to charges of inconsistency since I have
advocated not depositing an atomic model for atoms that weren't placed
by the depositor (i.e. disordered side chains).  I don't believe I'm
committing this sin.  I'm just saying if the depositor comes up with
locations for atoms they should be deposited.  If the location of an
atom is not known it should not be deposited.  I do not have a desire
for completeness for completeness' sake, just a complete listing of all
the atoms placed by the depositor.  Let that high school student see our
work in all its glory!

On 3/1/2020 4:53 AM, Tim Gruene wrote:
> Dear Dale,
> 
> your last sentence is of great importance:
> 
> "leaving the (hopefully) manually inspected and curated Hydrogen atoms in
> the deposited PDB"
> 
> I believe this hope is unrealistic. Most people do probably not think or 
> understand what refinement programs do about hydrogen atoms. In Refmac5 it 
> has 
> long been an option to generate hydrogen atoms for refinement but do not put 
> them out into the PDB file. Like Ethan, I believe this is best practice. Of 
> course, in case hydrogen atoms have been curated, one may leave them in for 
> deposition. It is not useful to see all the H-atoms in Coot, and chemists 
> omit 
> hydrogen atoms as well even for 2D drawings.
> 
> @Matthew Whitley: Adding hydrogen atoms in calculated (riding)  positions 
> should be rather independent of resolution of the data, since their major 
> role 
> is in improving anti-bumping restraints, and since their major contribution 
> to 
> the diffraction data is in the low resolution data. 
> 
> Best,
> Tim
> 
> 
> On Sunday, March 1, 2020 9:26:29 AM CET Dale Tronrud wrote:
>> Dear Ethan,
>>
>>To move away from an abstract discussion of hydrogen atoms I'd like
>> to describe a concrete example.  In 2008 I deposited a model of the FMO
>> (Bacteriochlorophyll containing) protein.  The ID code is 3EOJ.  The
>> model was refined to a data set cut off at 1.3 A resolution using the
>> criteria of the day.  I used shelxl for the final stage of refinement
>> and added riding hydrogen atoms to the mix.  When I deposited the model
>> I succumb to peer pressure and removed the hydrogen atoms.
>>
>>If you look at the map calculate by the Electron Density Server you
>> will see many peaks in the Fo-Fc map indicating the missing hydrogen
>> atoms.  (I have attached a screen-shot from Coot but I recommend that
>> you fire up Coot and explore the map yourself.)  In my picture you can
>> see the three peaks around a methyl group.  Above and to the left is the
>> peak for the hydrogen of a CH b

[ccp4bb] Probable bugs in acedrg

2020-02-02 Thread Dale Tronrud

Hi,

   I have been exploring a PDB model that contains an RNA molecule with
a ligand stuck on the 3' phosphate.  To perform real-space refinement in
Coot I needed to create a standard geometry CIF for the ligand and one
for the linkage between the ligand and an Adenine.  The CIF for the
ligand in CCP4 and the PDBe has (in my opinion) serious defects, but
with help from the masters at Global Phasing I was able to get a fine
set of restraints from the Grade server.

   To generate the linkage CIF I followed the very clear instructions at

https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/acedrg/acedrg.html

to use acedrg.  Long story short, one creates a one line description of
the link and feeds that into acedrg along with the two relevant CIFs and
Bob's your uncle.

   acedrg didn't work for me and I think I've uncovered two bugs that I
had to work-around to get my file.  I don't know who to report these
bugs to -- hence this letter.

   I typed the command to cause acedrg to produce my link, but it failed
saying (among other things)

The component A.cif contains a deloc or aromatic bond. It needs to be
kekulized.
You could use acedrg to do kekulication of the bonds.
Try getting A.cif from PDB/CCD and then use it as an input file
to run acedrg to generate a cif file for the ligand description.
e.g. You get the file A.cif from PDB/CCD and then,
acedrg -c A.cif -o A_fromAcedrg
...

The suggested second run of acedrg fails with the message (again truncated)

The system contains atoms of the following elements
P   O   C   N   H   "H5'
The input ligands/molecules contains metal or other heavier atoms
Acedrg currently deals with ligands/molecules with following elements only
C, N, O, S, P, B, F, Cl, Br, I, H

   There is no atom in A.cif from the CCP4 library with element "H5'.
It does contain an atom with the odd name "H5' ".  Knowing that many
programs have problems with embedded spaces in names, and
seeing no need for this embedded space, I changed the atom name and
tried again.  acedrg now completes the operation and writes out a new cif.

   This is bug #1.
   One might also consider the space in the name of this hydrogen atom
to be a bug, but A.cif is a widely used file and it works for it
principal purposes.

   Unfortunately, using the modified cif in the command for creating the
link still fails.  It appears that the kekulizer failed to kekulate.  To
investigate this kekulonic fault I examined the kekuleated cif and found
that the "delocalized" bonds of the phosphate were not affected by the
kekuliator.  I attempted to manually kekulinate it by changing these
bonds to a mixture of single and double bonds, with some trial and error
required to get an acceptable mix, and created a cif that was now
acceptably kekulinzed.  This file was used by acedrg and I got the link
I wanted.

   Bug #2 is the failure of acedrg's kekulinifier to remove all of the
delocalized bonds.

   I hope this note makes it to acedrg's developers and they find it
useful.  It may also be helpful to others attempting this task (at least
until the bugs are fixed).

Dale Tronrud



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Are there any proteins capable of crystallizing at a wide range of pH having the same space group?

2019-09-19 Thread Dale Tronrud

   My recollection is that gamma-chymotrypsin crystals will persist in
pHs all the way from 3 to 9.  I don't know if the crystals will grow
over that range.

   There are a fair number of phage T4 lysozymes variants in the PDB.  I
don't think this is considered "overpopulation" but a valuable
contribution to humanity.  ;-)

Dale Tronrud

On 9/19/2019 4:03 PM, Murpholino Peligro wrote:
> A quick glance at the entries of hen egg white lysozyme in the PDB show
> that it can be crystallized at different pH values, but the space group
> is not always the same. I still have to refine the analysis but I was
> wondering that maybe there are a few proteins that can crystallize at a
> wide range (maybe not that wide) of pH values and still have the same
> space group?
> 
> To refine the analysis a wee further: By any chance do you know any
> proteins overpopulating the PDB (i.e. besides HEWL)?
> 
> Lots of thanks as always.
> 
> Murphy
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Problem in real space - please sign & invite other scientists to sign this letter

2019-08-20 Thread Dale Tronrud

Dear Dr Himmel,

   I certainly agree that a scientist getting into the nitty-gritty of a
field outside of their training can make quite silly mistakes.  The link
you have provided to us makes an excellent example of your point.

   Dr Akasofu is a trained space scientist who studied the aurora
borealis and then spent most of his career in administration. (According
to the blog post)  Since his childhood he has disbelieved most anything
proposed by experts in whatever field.  Now we are expected to take
seriously his vague ramblings about "cycles" in climate?  What evidence
does he present?  Nothing but speculation that the climate "might" be
driven by this or "might" be driven by that.  Dr Akasofu brings nothing
to the table.

Dale Tronrud

On 8/20/2019 6:23 PM, Daniel M. Himmel, Ph. D. wrote:
> Dear colleagues,
> 
>  
> 
> Since when does being a structural biologist make us experts in
> climatology, 
> 
> and isn't it a breach of basic ethical practice and professionalism as
> scientists 
> 
> to sign on as authors to an article for which we have neither contributed
> 
> research nor intellectual content of the manuscript?  Are we now going
> against 
> 
> the standard to which the editorial policies of leading reputable
> biological 
> 
> journals normally hold us as authors?  And doesn't it hurt the credibility 
> 
> of a serious scientific article, its authors, and the journal in which
> it appears 
> 
> if biologists with no expertise in earth science/astrophysics appear 
> 
> without humility as authors to such an article?
> 
>  
> 
> Are you not embarrassed to put your name to an article that uses physical
> 
> sciences data as a platform for preaching about religion, politics, and
> economic
> 
> theory ("...social and economic justice for all...")?
> 
>  
> 
> Does it not upset you when someone unfamiliar with structural biology draws
> 
> firm conclusions that heavily depend on the part of a structural model
> that has high
> 
> B-factors?  So why are you unconcerned that you may be guilty of an
> analogous
> 
> error when, as structural biologists, you put your name to a
> controversial interpretation 
> 
> of selected earth science data?  See, for example,
> 
> https://blogs.agu.org/geospace/2017/02/24/living-warm-peak-ice-ages/
> about the ways 
> 
> climate data can be misinterpreted by choosing too tight a time
> interval, and lets stick to 
> 
> structural biology and allied sciences in the CCP4 list, please.
> 
>  
> 
> Respectfully,
> 
> Daniel M. Himmel
> 
>  
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Extra density close to phosphate bound to Zn2+

2019-08-05 Thread Dale Tronrud

   One thing to remember is that Li+ only has two electrons.  It should
be a little more than twice the density of an hydrogen (because the
charge pulls in the electron cloud).  If you are seeing lithium at 8
"sigma" you should be seeing the well located hydrogen atoms at 4
"sigma".  Are you?

   I always like controls.  Take out one of the near by water molecules
and see how that, known, difference peak compares to your mystery peak.
 A Li+ should have about 1/5th the height of an H2O.  If you like the
hypothesis of two orientations of the PO4 group, the relative height of
the two peaks will give insight to the occupancy ratio.

   Remember, if the PO4 has two orientations in the crystal, the water
molecules it (they?) is/are bound to will likely also have alternatives.

Dale Tronrud

On 8/5/2019 6:05 AM, Maria Håkansson wrote:
> Dear CCP4 bulletin board,
> 
> I am working with some lytic enzymes called endolysines, which
> bind Zn2+ in the active site. I have three homologues protein
> determined to 1.2 Å each where the Zn2+ is bound
> to a cystein, two histidines and one phosphate ion added (1.9-2.3 Å
> binding distances) in the crystallization experiment.
> 
> Now to my question. Close to the phosphate (B-factor=20Å2) ion a 8 sigma
> peak is present in all three endolysines, see below.
> I have modeled it to a Na+ (B-factor= 30 Å2) or a Li+ (B factor = 13Å2)
> ion. 
> Sodium has benn added in the crystallization experiments since sodium
> potassium phosphate
> salt has been used. The only reason for including Li+ is that I think
> the binding distances (1.7-2.0 Å) are too short for Na+.
> 
> I have also tried to make a model with the phosphate in two different
> conformations but it does not fit.
> 
> Have anyone seen something similar before? What is the most correct way
> of dealing with unknown densities?
> It is difficult to disregard +8 sigma difference density close to the
> active site.
> 
> Thanks in advance for any help!
> 
> Best regards,
> Maria
> 
>  
> 
> Maria Håkansson, PhD, Crystallization Facility Manager
> Principal Scientist
> 
> SARomics Biostructures AB
> Medicon Village
> SE-223 81 Lund, Sweden
> 
> Mobile: +46 (0)76 8585706
> Web: www.saromics.com <http://www.saromics.com>
> 
> 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] (EXTERNAL) Re: [ccp4bb] acceptable difference between Average B-factor and Wilson B

2019-03-12 Thread Dale Tronrud

   I agree completely!  The higher resolution data is determined
entirely by the atoms with low B factor.  If fact, the Wilson B plots
I've seen have a distinct curve to them -- They are not straight lines.
 As one looks to higher and higher resolution the curve gets shallower
and shallower.  You can actually see the population of atoms
contributing to each increasing resolution has a lower and lower average B.

Dale Tronrud

On 3/12/2019 2:24 PM, Edward A. Berry wrote:
> What if you have one domain with many B-factors aroun 70 and above, and
> another domain with B-factors around 20? The atoms with high B-factor
> will make essentially no contribution to the intensty of spots beyond 3
> A, and so have no effect on the slope of the Wilson plot byond that. But
> they will contribute mightily to the average atomic B. Or so it seems to
> me.
> eab
> 
> On 03/12/2019 04:39 PM, DUMAS Philippe (IGBMC) wrote:
>>
>> Le Mardi 12 Mars 2019 19:55 CET, Dale Tronrud 
>> a écrit:
>>
>> Dale
>> Good to have the opportunity of going back to the crystallography of
>> the fifties in these post-modern times...
>> There is an essential argumentation that should be recalled. The only
>> reason for the fact that one ignores low-resolution data in a Wilson
>> plot is that a Wilson plot is based precisely upon Wilson statistics,
>> which assumes that the atoms are cast randomly in the unit cell.
>> This assumption obviously does not hold at low resolution and there is
>> no reason to obtain a straight line that stems from the latter
>> assumption.
>> Therefore, I do not think one may say that a Wilson plot tends to
>> ignore atoms with high B values.
>> Consequence: if one has data at rather low resolution, a Wilson plot
>> is inherently inaccurate, but if one has data at very high resolution,
>> the Wilson plot should give a very decent estimate of the average B
>> and any significant discrepancy should ring the bell.
>> Philippe Dumas
>>
>>
>>>     The numeric average of the B factors of the atoms in your model only
>>> roughly corresponds to the calculation of the Wilson B.  While I always
>>> expect the average B to be larger than the Wilson B, how much larger
>>
>>> depends on many factors, making it a fairly useless criteria for judging
>>> the correctness of a model.
>>>
>>>     While it is pretty easy to understand the average of the B
>>> factors in
>>> your model, the Wilson B is more difficult.  Since it is calculated by
>>> drawing a line though the (Log of) the intensity of your structure
>>> factors as a function of the square of sin theta over lambda, it is
>>> rather removed from the atomic B factors.  When drawing the line the low
>>> resolution data are ignored because those data don't fall on a straight
>>> line, and this means that the large B factor atoms in your model are
>>
>>> ignored in the Wilson B calculation.
>>>
>>>     The Wilson B is (sort of) a weighted average of the B factors of
>>> your
>>> model, with the smallest B's given the largest weight.  The actually
>>
>>> weighting factor is a little obscure so I don't know how to simulate it
>>> to adjust the averaging of atomic B's to come out a match.  The easiest
>>> way to compare your model to the Wilson B is to calculate structure
>>> factors from it and calculate the Calculated Wilson B.  No one does this
>>> because it will always come out as a match.  If your calculated Wilson B
>>> doesn't match the observed Wilson B your R values are guaranteed to be
>>> unacceptable and your refinement program will have to be malfunctioning
>>> to create such a model.
>>>
>>>     If all the B factors in your model are equal to each other, your
>>> refined model will have an average B that matches the Wilson B, because
>>> weighting doesn't matter in that situation.  If you allow the B's to
>>
>>> vary, the difference between the average and the Wilson B will depend on
>>> how high of an individual B factor you are willing to tolerate.  If you
>>> are a person who likes to build chain into weak loops of density, or
>>
>>> build side chains where there is little to no density, then your average
>>> B will be much larger than the Wilson B.  This does not mean there is an
>>> error, it is simply a reflection of the Wilson B's insensitivity to
>>> atoms with large B.
>>>
>>>     I do not believe comparing the average B to the Wilson B has any
>>> utility at all.
>>>
>>> Dale Tronrud
&g

Re: [ccp4bb] acceptable difference between Average B-factor and Wilson B

2019-03-12 Thread Dale Tronrud

   The numeric average of the B factors of the atoms in your model only
roughly corresponds to the calculation of the Wilson B.  While I always
expect the average B to be larger than the Wilson B, how much larger
depends on many factors, making it a fairly useless criteria for judging
the correctness of a model.

   While it is pretty easy to understand the average of the B factors in
your model, the Wilson B is more difficult.  Since it is calculated by
drawing a line though the (Log of) the intensity of your structure
factors as a function of the square of sin theta over lambda, it is
rather removed from the atomic B factors.  When drawing the line the low
resolution data are ignored because those data don't fall on a straight
line, and this means that the large B factor atoms in your model are
ignored in the Wilson B calculation.

   The Wilson B is (sort of) a weighted average of the B factors of your
model, with the smallest B's given the largest weight.  The actually
weighting factor is a little obscure so I don't know how to simulate it
to adjust the averaging of atomic B's to come out a match.  The easiest
way to compare your model to the Wilson B is to calculate structure
factors from it and calculate the Calculated Wilson B.  No one does this
because it will always come out as a match.  If your calculated Wilson B
doesn't match the observed Wilson B your R values are guaranteed to be
unacceptable and your refinement program will have to be malfunctioning
to create such a model.

   If all the B factors in your model are equal to each other, your
refined model will have an average B that matches the Wilson B, because
weighting doesn't matter in that situation.  If you allow the B's to
vary, the difference between the average and the Wilson B will depend on
how high of an individual B factor you are willing to tolerate.  If you
are a person who likes to build chain into weak loops of density, or
build side chains where there is little to no density, then your average
B will be much larger than the Wilson B.  This does not mean there is an
error, it is simply a reflection of the Wilson B's insensitivity to
atoms with large B.

   I do not believe comparing the average B to the Wilson B has any
utility at all.

Dale Tronrud

On 3/12/2019 11:34 AM, Eze Chivi wrote:
> Dear CCP4bb community,
> 
> 
> The average B-factor (calculated from model) of my protein is 65,
> whereas the Wilson B is 52. I have read in this BB that "it is expected
> that average B does not deviate strongly from the Wilson B". How I can
> evaluate if the difference calculated for my data is razonable or not?
> 
> 
> Thank you in advance
> 
> 
> Ezequiel
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Confused about centric reflections

2019-03-02 Thread Dale Tronrud

   You are correct, other than your typo.  The centric zone in a
monoclinic space group (B setting) is h0l.

   This web site is a wiki so you should be able to correct it yourself.

Dale Tronrud

On 3/2/2019 2:00 PM, Edward A. Berry wrote:
> The wiki:
> 
> https://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Centric_and_acentric_reflections
> 
> 
> says:
> "A reflection is centric if there is a reciprocal space symmetry
> operator which maps it onto itself (or rather its Friedel mate).
> . . .
> Centric reflections in space group P2 and P21 are thus those with 0,k,0."
> 
> The operator -h,k,-l does NOT take 0,k,0 to its Friedel mate.
> it takes h,0,k to their Friedel mates. In other words the plane
> perpendicular to the 2-fold axis, at 0 along the axis
> 
> Or am I missing something?
> eab
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Experimental phasing vs molecular replacement

2018-12-06 Thread Dale Tronrud

   What may be counter-intuitive when looked at in one way may be
perfectly expected from another point of view.  I look at these maps as
the result of a single cycle of steepest descent refinement where the
parameters are the density values of the map sampled on the grid.  If
you start with a map calculated from the coefficients

(|Fcalc|, PhiCalc)

One cycle of steepest descent gives the shift

2(|Fobs|-|Fcalc|, PhiCalc)

giving a new and improved map with the coefficients

(|Fcalc|, PhiCalc) + 2(|Fobs|-|Fcalc|, PhiCalc)
   = (|Fcalc| + 2|Fobs| -2|Fcalc|, PhiCalc)
   = (2|Fobs| - |Fcalc|, PhiCalc)

or the classic 2Fo-Fc map.

   If you start with (|Fobs|, PhiObs) then your shift will be zero
because the R value is already perfect.  You cannot improve an
experimental map unless you refine against other criteria.

   On the other hand, if you start with (|Fcalc|, PhiObs) you have to
question your sanity a bit because |Fobs| is so much better, in fact
perfect.  If you decide to press ahead anyway you find that the
coefficients of the updated map are (2|Fobs| - |Fcalc|, PhiObs).  These
are better than (|Fcalc|, PhiObs) but still not as good as (|Fobs|, PhiObs).

   There really is no justification for simply attaching the observed
phases to the calculated amplitudes.  The reason we are doing atomic
model refinement (instead of density map refinement as described above)
is to impose a lot of external knowledge such as atomic shape, solvent
flatness, and various relationships between atoms.  All of that
information gets encoded in the Fcalc's (complex numbers) so their
amplitudes and phases are tightly coupled.  It is no surprise that just
ripping out half and replacing it with something else would lower the
quality of the contained information.

   If you want a map to help evaluate your model when you have
experimental phase information you should run one cycle of steepest
descent optimization on the map with both amplitudes and phases
restrained.  If I ignore the complication of much larger uncertainty of
the phase relative to the amplitude, I believe the single cycle shift is
a map calculated from the complex coefficients 2(Fobs-Fcalc) and this is
your "difference map".  The 2Fo-Fc equivalent would have the
coefficients 2Fobs-Fcalc.  Remember they are all complex numbers with
their proper phases.

   I did this derivation in Least-Squares formalism so I can't be
confident of the m's and D's.  I also assumed that Fridel's Law holds,
but that assumption was made with the traditional maps as well.

Dale Tronrud

On 12/6/2018 11:01 AM, James Holton wrote:
> Sorry for the confusion, I was going for brevity.
> 
> Any time you do a thought experiment you make a fake-data data set, the
> "true" phases and "true" amplitudes become the ones you put into the
> simulation process.  This is by definition.  Is there potential for
> circular reasoning?  Of course!  But you can do controls:
> 
>   If you start with an ordinary single-conformer coordinate model and
> flat bulk solvent from refmac to make your Ftrue, then what you will
> find is that even after adding all plausible experimental errors to the
> data the final Rwork/Rfree invariably drop to small-molecule levels of
> 3-4%.  This is true even if you prune the structure back, shake it, and
> rebuild it in various ways.  The difference features always guide you
> back to Rwork/Rfree = 3/4%. However, if you refine with phenix.refine,
> you will find Rwork/Rfree stall at around 10-11%.  This is because Ftrue
> came from refmac and refmac and phenix.refine have somewhat different
> bulk solvent models.  If Ftrue comes from phenix and you refine with
> refmac you get similar "high" R values.  High for a small molecule
> anyway. And, of course, if you get Ftrue from phenix and refine with
> phenix you also get final Rwork/Rfree = 3/4%. If you do more things that
> automated building doesn't do, like multi-headed side chains, or get the
> bulk solvent from an MD simulation, then you can get "realistic"
> Rwork/Rfree in the 20%s.  All of this is the main conclusion from this
> paper: https://dx.doi.org/10./febs.12922
> 
> But, in all these situations with various types of "systematic error"
> thrown in, because you know Ftrue and PHItrue you can compare different
> kinds of maps to this ground "truth" and see which is closest when you
> compare electron density. In my experience, this is the 2mFo-DFc map,
> phased with PHIcalc from the model. You might think that replacing
> PHIcalc with PHItrue would make the map even better because PHItrue is a
> "better" phase than PHIcalc, but it turns out this actually make things
> worse!  That's what is counter-intuitive: 2mFo-DFc amplitudes are
> "designed" to be used with the slightly-wrong phase of PHIcalc, not
> PH

Re: [ccp4bb] high B factor

2018-11-11 Thread Dale Tronrud

   Okay, I'm an idiot.  The letter did say molecular replacement.  I
have no excuse for being so blind.

   I do stand by my statement: I don't see that an average B factor of
82 A^2, by itself, indicates any serious problem with a model at 2.65 A
resolution.

   I have generally found that the average B factor of a model is
uninformative.  All refinement programs will make certain that the fall
off in intensity of the calculated F's match those of the observations,
even if the model is completely wrong.  The average B factor can vary
widely between different models of the same data if the distribution of
individual B's differ.

   For example, a model built by a person in the "never truncate side
chains" camp will have a higher average B factor than the model built by
a "truncater".  That just means that the high end tail of the B factor
distribution has been cut off.  Atoms with B factors in the 200's only
affect the intensities of a very small number of reflections anyway and
do not have any effect on the Wilson B.

   If a structure has a wobbly domain or an unusual amount of surface
area exposed to solvent, the model will have a lot of atoms with large B
factors.  The average B factor will be high, but none-the-less the
majority of the atoms will be well ordered and the Wilson B small.  This
is not an indication of error.  It is exactly what one would expect from
such a structure.

   We don't know what the distribution of individual B factors is in Dr.
Anandan's model.  We don't know if Dr. Anandan truncates side chains or
has a liking for building protein main chain through weak density.  We
don't if there are multiple copies of the protein in the asymmetric
unit, nor if some are more ordered than others.

   All we know is that the resolution of the data is 2.65 A (whatever
that means), that the B factor average is 82 A^2 (whatever that means)
and that there is one atom with a B factor of 34.57 A^2 and another with
225.13 A^2.  I just don't see that these few facts indicate a problem.

   If Dr. Anandan has some reason to distrust this model, like maybe not
being able to detect the image of the allegedly bound ligand, a great
number of details of the protein, data collection and the current model
would have to be shared.  When asking such a question on this BB you
should never be concerned that your letter is too long.

Dale Tronrud

On 11/11/2018 1:06 PM, Dale Tronrud wrote:
> 
>Did I miss a follow-up letter with more information?  All I've seen
> is that Dr. Anandan said that there was a model based on 2.65 A
> resolution data with an average B factor of 82 A^2.  Does this fact
> alone call for weeks of work attempting to remove model bias?  Dr.
> Anandan didn't even say this structure was solved via molecular
> replacement.  No indication of high R factor or stuck refinement. No
> details at all.  Just a model with a fairly reasonable, if somewhat low,
> average B factor.
> 
> Dale Tronrud
> 
> On 11/10/2018 6:31 PM, Daniel M. Himmel, Ph. D. wrote:
>> Anandhi,
>>
>> Assuming the data reduction went well, and you're in the right space goup,
>> there could be a lot of model bias in your structure stemming from the
>> starting model.  
>>
>> There are a lot of things to try.  I would
>> set all the B-factors to an artificially low B-factor to help de-mask
>> errors.  Then,
>> you can generate a composite omit map and FEM maps to see if any obvious
>> model errors show up in the electron density.  After correcting these,
>> you can try running
>> Autobuild and PhaseAndBuild in Phenix.  Compare all the models you get from
>> each of these, especially in regions where your original model had high
>> b-factors.
>> Use Coot to identify areas of poor agreement with electron density and areas
>> of poor geometry.  Once you spent a while correcting the model, put it
>> through
>> one or more cycles of simulated annealing at different temperatures in
>> parallel.
>> Select several sets of coordinates that give the best Rfree convergence,
>> and then
>> subject those models to individual B-factor refinement.  After that,
>> check again in Coot
>> for areas of high B-factors and areas of poor geometry (try especially
>> to improve
>> your Ramachandran Plot).  Use MolProbity to help in identifying errors
>> and clashes.
>> If a few rounds of simulated annealing and model building don't improve
>> things, try some 
>> refinement in CCP4 Refmac.  PDB_REDO, which uses Refmac, can also help 
>> give you alternative models.  While doing all these, don't be afraid to
>> "cut-and-paste"
>> regions from one model into another model and then correct the geometry
>> in Coot.
>> If B-factors don't come down no matter what you do, you

Re: [ccp4bb] high B factor

2018-11-11 Thread Dale Tronrud

   Did I miss a follow-up letter with more information?  All I've seen
is that Dr. Anandan said that there was a model based on 2.65 A
resolution data with an average B factor of 82 A^2.  Does this fact
alone call for weeks of work attempting to remove model bias?  Dr.
Anandan didn't even say this structure was solved via molecular
replacement.  No indication of high R factor or stuck refinement. No
details at all.  Just a model with a fairly reasonable, if somewhat low,
average B factor.

Dale Tronrud

On 11/10/2018 6:31 PM, Daniel M. Himmel, Ph. D. wrote:
> Anandhi,
> 
> Assuming the data reduction went well, and you're in the right space goup,
> there could be a lot of model bias in your structure stemming from the
> starting model.  
> 
> There are a lot of things to try.  I would
> set all the B-factors to an artificially low B-factor to help de-mask
> errors.  Then,
> you can generate a composite omit map and FEM maps to see if any obvious
> model errors show up in the electron density.  After correcting these,
> you can try running
> Autobuild and PhaseAndBuild in Phenix.  Compare all the models you get from
> each of these, especially in regions where your original model had high
> b-factors.
> Use Coot to identify areas of poor agreement with electron density and areas
> of poor geometry.  Once you spent a while correcting the model, put it
> through
> one or more cycles of simulated annealing at different temperatures in
> parallel.
> Select several sets of coordinates that give the best Rfree convergence,
> and then
> subject those models to individual B-factor refinement.  After that,
> check again in Coot
> for areas of high B-factors and areas of poor geometry (try especially
> to improve
> your Ramachandran Plot).  Use MolProbity to help in identifying errors
> and clashes.
> If a few rounds of simulated annealing and model building don't improve
> things, try some 
> refinement in CCP4 Refmac.  PDB_REDO, which uses Refmac, can also help 
> give you alternative models.  While doing all these, don't be afraid to
> "cut-and-paste"
> regions from one model into another model and then correct the geometry
> in Coot.
> If B-factors don't come down no matter what you do, you could be in the
> wrong space 
> group or have problems with the original data that need to be addressed.
> 
> I hope this helps.
> 
> Daniel
> 
> ___
> 
> Daniel M. Himmel, Ph. D.
> 
> URL:  http://www.DanielMHimmel.com/index_Xtal.html
> <http://www.danielmhimmel.com/>
> 
> 
> 
> 
> On Thu, Nov 8, 2018 at 7:41 PM Anandhi Anandan
> mailto:anandhi.anan...@uwa.edu.au>> wrote:
> 
> Hello everyone,
> 
> 
> I am trying to solve the structure of a protein with a bound ligand
> at 2.65 A resolution. XDS was used for data reduction, phaser-MR for
> molecular replacement  and Phenix for refinement. The refinement was
> done with the default settings ( individual B factors, occupancy and
> TLS parameters). The resultant atomic B factors are quite high.  The
> overall B factor is 82 with a minimum value of 34.57 and maximum of
> 225.13. I would like to know if any of the data reduction parameters
> can affect the B factors and how best to deal with this issue.
> 
> 
> Anandhi
> 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

[ccp4bb] high B factor

2018-11-08 Thread Dale Tronrud

Hi,

   The "B factors" reflect the rapidity of the decrease in intensity of
scattering with increasing resolution.  The scattering of a crystal
becomes unmeasurable at a lower resolution if its intensities drop
faster.  This results in a connection between the resolution of a data
set and the average B factor of the atoms in the resulting model.  With
lower resolution come higher B factors.

   It is inevitable.  It cannot be avoided.

   An average B factor of 82 A^2 for a 2.65 A data set actually sounds
low to me.  Are you taking the TLS contributions into account when
calculating this average?  In any case, you certainly shouldn't be
worrying about your "high" B factors.

Dale Tronrud

On 11/8/2018 4:30 PM, Anandhi Anandan wrote:
> Hello everyone,
> 
> 
> I am trying to solve the structure of a protein with a bound ligand at
> 2.65 A resolution. XDS was used for data reduction, phaser-MR for
> molecular replacement  and Phenix for refinement. The refinement was
> done with the default settings ( individual B factors, occupancy and TLS
> parameters). The resultant atomic B factors are quite high.  The overall
> B factor is 82 with a minimum value of 34.57 and maximum of 225.13. I
> would like to know if any of the data reduction parameters can affect
> the B factors and how best to deal with this issue.
> 
> 
> Anandhi
> 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Unidentified large blobs in the electron density

2018-08-05 Thread Dale Tronrud

   Positive difference map features indicate a likelihood that atoms
should be placed in that location.  Putting atoms in that density, any
atoms, will lower the R value.  This does not mean that your
interpretation is correct.

   The fact that you, the person who has seen this map most clearly,
can't decide between a PO4 and a Ade means that this density is
ambiguous.  These are two very different shapes!

   A constellation of isolated blobs is about the worst thing to figure
out.  It is quite possible that you have a partially occupied something
with ordered water molecules when that thing is not present.  When
looking for partially occupied things you have to contour the map at a
much lower level, but not take what you see too seriously.  Remember you
are looking for something at, say, 50% occupancy, with 50% occupied
water molecules sitting on top.

   That said, I don't understand you difference maps.  You built a ADN
into your blobs, and that molecule is quite a bit larger than your
blobs, yet you have a huge amount of positive difference density
covering your ADN.  How is this possible?  The goal of refinement is to
make the difference map go to zero at the location of atoms.  There is
something about these maps that you are not telling us.

   When interpreting blobs the first and most important thing to
consider is what is in the crystal.  If it doesn't contain PO4 there is
no reason to test PO4 as a possibility.  If you didn't add ADN then you
are hypothesizing that it was carried all the way through purification
which means that has to bind fairly strongly to survive even at partial
occupancy.  Is this location on your protein a nucleotide binding site?
These things are easily recognized just from the structure of the protein.

   Does the hydrogen bonding and charge-charge interactions of your
model make sense?  It is hard to tell in a flat picture, but I don't see
many hydrogen bonds to your ADN model.  If I compare your PO4 model to
the structure I see in the ADN map, I see that there is a ASP right next
to that blob.  You can't put a PO4 next to an ASP.

   Since the map is confusing you have to use as much information from
it as possible (lower contours) but add in as much information from
other sources as possible.  What's in the crystal?  What is your
cryoprotectant?  Is this part the the protein a known binding motif for
something?  What is the function of this protein and what sorts of
compounds might be expected to bind to it?  Is this an enzyme and is the
spot anywhere near the active site?  Do you know where the active site is?

   Once you build a model you have to test it.  You should be your worst
critic!  As I said, a drop in R value is meaningless.  Does the
chemistry make sense?  Can you explain why that molecule is there?  Does
it have a purpose?  Can you perform an experiment that confirms your
model?  Can you soak more of that compound into the crystal and see an
increase in occupancy?  Can you analyze the crystal by some other means
to detect the compound?  Mass Spec?  If PO4, can you detect the presence
of Phosphorus?  The validation has to be designed based on your model.

   You can get better maps than simple Fo-Fc maps.  I really like the
maps produced by Buster and have great success with getting better views
that way.  It is possible that the "Polder" map from Phenix might help -
I'm not too clear on the difference between it and the Buster map.

   Load as much information into your head as you can, get the best
possible map(s), and start running though the possibilities!  And be
willing to accept that you may never figure it out.  Don't build a model
you don't believe.

   I once spent about twenty years trying to figure out a blob (not full
time!).  I got a nice paper about it in the end.

Dale Tronrud

On 8/5/2018 7:00 AM, Preeti Preeti wrote:
> Dear CCP4 member
> 
> I am  solving a protein structure with Resolution 2.2 Angstrom. I could
> see some blobs (2fo-fc @1sigma) and fo-fc @ 2.5 sigma) and need your
> suggestions on these extra electron densities. In addition to this, in
> one of the large blob I have added the phosphate group.
>  This is a nucelotide binding protein however structure I am showing
> here was of native protein crystal without any nucleotide soaking and
> also no phosphate buffer was used at any time during the purification as
> well as crystallization process.  
> Furthermore in one such blob I tried fitting the adenine moiety though
> it is not fitting exactly in the map, it decreased the Rfree value
> significantly, 
> 
> kindly suggest me what it should be corresponding to?
> 
> Also please let me know if any other structure information required
> regarding this protein data or this blob density
> 
> Thanks a lot in advance
> 
> 
> 
> 
> 
> 
> 
> Preeti
> 
> 
> 
> ---

Re: [ccp4bb] RMS bond and angle

2018-07-05 Thread Dale Tronrud

   I'm not quite sure what is "wrong" here except perhaps for the idea
of "manual correction".  If you calculate the rmsd for thousands of
angles and you change four of them it is unlikely there will be a large
change in its value.  The reason programs disclose both the overall rmsd
and the individual outliers is that the two measures of quality are
mostly unrelated.

   Different programs have differing libraries of "ideal" values for
bond lengths and angles and will report slightly different rmsd values.
Why do you consider the different values reported by Buster and
MolProbity to be a problem?

   What does concern me is the idea of "manual correction" of an
outlier.  The model you deposit should have been produced by a
refinement program, without arbitrary changes made by the user.  If the
refinement program produces a model with outliers you can look at those
instances and identify problems with your model.  Maybe you can change
the rotomer of a side chain or perhaps there is a mistake in your ideal
geometry cif.  If you find a good reason why the computer is creating
that outlier and correct the problem, you can run more refinement and
get a model with fewer outliers.

   If you simply change the model in Coot to make the outlier "go away"
and declare that, edited, model to be your final model you are not doing
it right.  In most cases you will find that the refinement program would
take that model and recreate the outlier, so you haven't accomplished
anything other than hiding a shortcoming of your model.

   If you can't figure out how to get the program to produce a model
without that outlier then your model should be deposited with it, and
the ultimate users of your model can see that this region, at least, of
your model should be considered less reliable.  Tricking people into
placing too much trust in your model is not a good idea.

Dale Tronrud.

On 7/5/2018 1:11 AM, zheng zhou wrote:
> Hi all
> 
> Just finishing up a new structure at 2.4A. Buster refine gives RMS bond
> 0.008 and angle 1.13, while MolProbity gives 0.01 and 1.83 degree. I
> checked the 4 outliers from molprobity……>4sigma. After manual
> correction, warning goes off, but RMS angle only goes down to 1.82
> 
> I am using Phenix 1.13 and buster2.11.6
> 
> Could not figure out what went wrong.
> 
> Sorry for not CCP4 related questions.
> 
> Thanks 
> Joe
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Calculating sigma value

2018-05-01 Thread Dale Tronrud

Dear Ian,

   Clearly you have a passion for the SI.  After our last discussion of
this topic on the CCP4BB I downloaded "The International System of
Units: NIST Special Publication 330" and actually read it.
(https://dx.doi.org/10.6028/NIST.SP.330e2008)  The SI is a fine system
of units and this document is well written and easy to follow.

   I don't believe that you and I have any serious disagreements about
"units".  I agree with pretty much everything you have written in your
letter.

   We do seem to have a difference of opinion about the style of writing
appropriate for an informal venue such as the CCP4BB.

   While the SI units of a "quantity density" would be "inverse cubic
nanometers" the actual value would remain meaningless if the nature of
the "quantity" was left unspecified.  The document mentioned above makes
this clear (page 22) and I'm sure you agree as well.

   I think you believe that just saying "the map value is electron
density" is sufficient to lead the reader to understand that the
"quantity" with  SI'ish units of "/Angstrom^3" is, indeed, "number of
electrons", but I disagree.  After all, people commonly refer to the rms
normalized maps as "electron density" maps as well, so the term
"electron density" has been overloaded with several meanings and is now
ambiguous.  In a forum where people routinely use abbreviations such as
"IMHO", and "A" instead of "Angstrom", it seems to me ridiculous to say
"the number density of electrons in SI units of per Angstrom cubed" when
I could simply say "electrons/A^3" knowing that most readers would
understand my intent.

   In your post you defined electron density (in quotes) as "... or the
number of electrons per unit volume".  Those are pretty much exactly the
words the little voice in my head utters when I read the symbols
"electrons/A^3".  I never said that "electrons" is an SI unit, or that I
was even trying to obey the formality of the SI conventions.

   I will argue with you about the idea that electron density and proton
density are commensurate.  I don't really know of an application where
the sum of electron density and proton density makes any sense, much
less the sum of electron and rabbit density.  Any class library I would
write would throw an exception if you attempted to calculate such a thing.

   I presume your calculation is stepping toward the calculation of
charge density.  I would first calculate the charge density due to the
electron density, using the knowledge that there is one quantum of
negative charge for each electron, then calculate the charge density due
to the proton density, using knowledge of the charge of a proton, and
then sum the two charge densities, which are not only commensurate but
identical.  My code would then be ready should I run into a Xi baryon
with +2 charge whereas your code would have to be completely refactored. ;-)

Dale Tronrud

 P.S. Before the United States Department of Commerce comes knocking on
my door, I expect they would prosecute the RCSB.  Do you realize that
this criminal organization uses taxpayer money to store and distribute
files containing calculated structure factors represented as the number
of electrons per unit cell?!

On 4/30/2018 3:28 AM, Ian Tickle wrote:
> 
> Dale,
> 
> On 19 April 2018 at 17:36, Dale Tronrud <de...@daletronrud.com
> <mailto:de...@daletronrud.com>> wrote:
> 
>    The meaning of the term "e/A^3" as used in Coot has nothing to do
> with the charge of an electron.  The intention of its authors is to
> indicate that the value being represented by the map is the density of
> electrons.  It is the number of electrons per cubic Anstrom at that
> point in space.
> 
> 
> My intention was certainly not to imply that the unit designated 'e' in
> the units of electron density used by Coot should be taken to mean
> charge (in fact quite the opposite!), even though 'e' would normally be
> interpreted here as the atomic unit of charge
> (see 
> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication811e2008.pdf,
> Table 8). 
> 
>    I, generally, find it useless to say that a number is a density
> (units of "per volume") without saying density of what.  This is not a
> density of charge, not a density of mass, not a density of rabbits.  It
> is a density of electrons.
> 
> 
> First, note that (SI Guide cited above, section 7), quote: "The value of
> a quantity is its magnitude expressed as the product of a number and a
> unit, and the number multiplying the unit is the numerical value of the
> quantity expressed in that unit".  So by stating that the electron
> density is X electrons/A^3, you

Re: [ccp4bb] Calculating sigma value

2018-04-19 Thread Dale Tronrud

   The meaning of the term "e/A^3" as used in Coot has nothing to do
with the charge of an electron.  The intention of its authors is to
indicate that the value being represented by the map is the density of
electrons.  It is the number of electrons per cubic Anstrom at that
point in space.

   I, generally, find it useless to say that a number is a density
(units of "per volume") without saying density of what.  This is not a
density of charge, not a density of mass, not a density of rabbits.  It
is a density of electrons.

   Now the calculations are pretty simple.  When the refinement program
produces a model one of its parameters is a scale factor which relates
the relative values of the observed intensity to those calculated from
the model.  This scale factor, which is usually not written out in an
obvious place since its value is not very interesting, is just a simple
number, not a function of resolution or anything like that.  It's value
is also sensitive to a number of assumptions (like you are modeling
everything in your unit cell) so it is not particularly reliable.  This
means you shouldn't take the density values of your map terribly
seriously.  The differences in density from place to place is much more
reliable.

   This has led to the practice of converting the density values to
normalized values.  Here you calculate the rms value of the points in
the map and divide all the density points by that number.  "rms" is
simply what is says "Root of the Mean of the Squares".  Again this is
just a number, nothing fancy.

   The only trick is deciding what region of space to sum.  The crystal
is composed of regions which are occupied by ordered molecules (One
hopes this includs the molecule you are interested in!) and regions of
bulk solvent.  You could reasonably conclude that the rms's of these
regions should be considered different properties of the map.  I'm not
aware of any software that actually tries to calculate an rms for just
the region of the map that contains ordered structure.

   What was done in the past was to simply calculate the sum over the
region of the map presented to the program, and this was usually a
rectangular box inscribed over the molecule.  The corners of that box
would cover some amount of bulk solvent (and symmetry related molecules)
which depends on a lot of factors which shouldn't be affecting your
choice of contour levels.  This method is inconsistent and causes confusion.

   The conventional method today is to calculate the rms by summing
over the entire asymmetric unit.  This, at least, creates consistency,
and can be calculated from the Fourier coefficients making it an easy
number to come up with.  Many programs now use this method, including
Coot (As I understand).  It does have the drawback that a crystal with a
large amount of solvent will have a lower rms than one that is very dry,
even if the variation within the ordered structure is the same.  You
need to be aware of this when interpreting maps whose contours are based
on the rms.  Showing the contour level both as rms and electrons/A^3 is
an attempt to provide a fuller description.

   Ian is certainly right -- the rms of a function is not a "sigma".
This confusion is a problem that is endemic in our field!  "sigma" is
shorthand for standard deviation which is a measure of the uncertainty
in our knowledge about a value.   The rms is not a measure in any way of
uncertainty -- It is simply a description of how variable the values of
the function are.  James Holton has written a very nice paper on this
topic, but I don't have the reference on hand.

   The excuse for this confabulation is that people believe, in a Fo-Fc
style map, most of the values should represent factors other than errors
in the structural model and therefore one can estimate the uncertainty
of the map by calculating the rms over the map.  This assumption is
highly questionable and unreliable.  The major problem then arises when
the same logic is extrapolated to a 2Fo-Fc style map.  In these maps the
variability in the ordered region of the crystal is all "signal" so
calculating an rms really has nothing to do with uncertainty.

   Describing your contours or peaks in an Fo-Fc style map by rms can
sort-of, kind-of, be justified, but it makes no sense for a 2Fo-Fc style
map.  If I want to be really serious about deciding if a peak in a map
may be missing atoms, I will leave some known atoms out of the model and
see how the heights of their difference peaks compare to the heights of
the mysterious peaks.  This method is fairly insensitive to the
systematic problems that affect both rms and electrons/A^2.

Dale Tronrud

On 4/19/2018 8:30 AM, Ian Tickle wrote:
> 
> Hi Mohamed
> 
> The RMSD of the electron density (or difference density) is calculated
> by the FFT program using the standard equation that I referenced.  I
> would guess that what you se

Re: [ccp4bb] Arg distorsion during refinement

2018-02-09 Thread Dale Tronrud

   Coincidentally, I just noticed this problem while perusing a
validation report last night and am tooling up to see if I can get this
problem fixed.

   The origin of the flaw is the asymmetry of the NH1 and NH2 atoms in
Arginine.  If one considers only bonding it would appear that these two
atoms are interchangeable, but one clashes with CD and the other does
not.  The clash causes one bond angle to be larger than 120 deg and the
other to be smaller.  To allow this situation to be described the IUPAC
convention says that the atom near CD should be named NH1 and the other
one name NH2.  This naming convention is one of the things that Coot
begs to be allowed to fix every time you start it.

   In the Engh & Huber library (2001) this convention was not imposed
and their analysis was performed with the assumption that these two
angles are equivalent.  The result is a symmetrized value midway between
the two correct values, which as you note are about 123 deg and 117 deg.

   The bundling together of these two angles shouldn't cause a
validation problem expect for the fact the Engh & Huber assigned the
impossibly small sigma of 0.5 deg to this angle.  A 50/50 mixture of
angles each centered on 123 and 117 should have a sigma upwards of six
degrees.  Such a sigma would give a pass to correct models, but a 0.5
deg sigma will flag correct models with 6 sigma deviations.

   Until we can get the validation report generator fixed you should
tell this story to anyone who complains about your model.  Your model is
fine and the validation software needs better validation itself.

Dale Tronrud

On 2/9/2018 10:18 AM, Oganesyan, Vaheh wrote:
> Dear crystallographers,
> 
>  
> 
> Lately when refining a structure (at 2.8A) with Refmac5 I’ve found that
> nearly all Arg residues get distorted at one angle:  NE-CZ-NH1(2).
> Starting model has 120°, final model 123°(117°), which validation server
> considers a major issue. May any of you recognize why is this happening?
> Don’t remember seeing anything like this before.
> 
> Current CCP4 version 7.0.050; Refmac5 version 5.8.0189.
> 
>  
> 
> Last structure deposited in December’17 did not have those issues. CCP4
> version then was 7.0.047; Refmac5 version was the same.
> 
>  
> 
> Thank you for your time.
> 
>  
> 
> Regards,
> 
>  
> 
> /Vaheh Oganesyan/
> 
> /MedImmune, ADPE/
> 
> /www.medimmune.com/
> 
>  
> 
>  
> 
> To the extent this electronic communication or any of its attachments
> contain information that is not in the public domain, such information
> is considered by MedImmune to be confidential and proprietary. This
> communication is expected to be read and/or used only by the
> individual(s) for whom it is intended. If you have received this
> electronic communication in error, please reply to the sender advising
> of the error in transmission and delete the original message and any
> accompanying documents from your system immediately, without copying,
> reviewing or otherwise using them for any purpose. Thank you for your
> cooperation.

Re: [ccp4bb] Electrostatic Potential: Poisson-Boltzmann

2017-12-02 Thread Dale Tronrud

   I don't know anything about the practicalities of PDB2PQR but it
would seem to me that you have to calculate a potential for the molecule
with each conformation.  Then you would say "This is the potential with
the A altloc and THIS is the potential with B".  There will be no
individual molecule with the average potential so the average has no
chemical meaning.

   Of course life gets even harder when you have multiple side chains
with multiple conformations.  The combinatorials grow quickly.  If you
don't believe that the conformations are all tied to each other you have
to say that the ensemble of conformations leads to an ensemble of
potentials.  Somehow your chemistry has to work in the presence all this
variability and THAT is the interesting question you have to answer.

   I don't know what an ensemble of potentials looks like so one would
have to calculate some and see what their properties are.  I'm not aware
that anyone has done this, but my literature search has been very limited.

Dale Tronrud

On 12/2/2017 5:51 AM, Sam Tang wrote:
> To add to the discussion, could I raise a relevant question about
> generating ESP (Apologies to Jiri if this distracts too much from your
> initial thread). 
> 
> In our structure in hand, the density for two conformations of the side
> chain are clearly seen and they could be modeled. This brings a bit of
> problem because the positive charge becomes more prominent with two
> conformations there than with one. So what do we usually do when
> generating ESP for such structures with alternate conformations? Do we
> remove one before the calculation?
> 
> PS - I use online PDB2PQR server to do my calculation with PARSE field.
> I did notice from some old archived discussion on the Web that it
> ignores one conformation by default. But this seemingly is not the case
> in newer versions?
> 
> Regards
> 
> Sam
> 
> School of Life Sciences, CUHK
> 
> On 2 December 2017 at 02:59, Robbie Joosten <robbie_joos...@hotmail.com
> <mailto:robbie_joos...@hotmail.com>> wrote:
> 
> If you cannot trust the surface of your protein, perhaps you should
> not look at the the potential on the surface. Instead you can look
> at the field around your protein. This is less precise, but also
> less sensitive to local errors. If you want to know how your peptide
> finds your protein, this is actually more informative anyway.
> 
> There must be several programs that do this. I have done this for
> MHC in the past with YASARA. It really explained nicely how he
> peptide moved in.
> 
>  
> 
> Cheers,
> 
> Robbie
> 
>  
> 
> Sent from my Windows 10 phone
> 
>  
> 
> --------
> *From:* CCP4 bulletin board <CCP4BB@JISCMAIL.AC.UK
> <mailto:CCP4BB@JISCMAIL.AC.UK>> on behalf of Dale Tronrud
> <de...@daletronrud.com <mailto:de...@daletronrud.com>>
> *Sent:* Friday, December 1, 2017 7:29:01 PM
> *To:* CCP4BB@JISCMAIL.AC.UK <mailto:CCP4BB@JISCMAIL.AC.UK>
> *Subject:* Re: [ccp4bb] Electrostatic Potential: Poisson-Boltzmann
>  
>    These are not easy questions to answer.  Certainly atoms,
> particularly ones that are charged, even with fractional charges, have a
> strong effect on the ESP.  If you delete them because you don't know
> exactly where they are you will get a different answer than if you put
> them in in some reasonable but unsupported location (as you have found).
>  This result indicates that the peptide does affect the ESP
> significantly and you have to consider it.
> 
>    You could build lots of models with the peptide in different
> conformations and average all the maps.  This misses the point.  You
> have uncertainty in your model which means that you have uncertainty in
> your electrostatic potential.  Any particular ESP that you calculate and
> draw conclusions from will have a large uncertainty and you must
> consider that uncertainty when deciding between your potential
> conclusions.  (I'm not sure if the pun is intended or not!)
> 
>    I suppose you could believe that each possible conformation exists to
> some extent in reality which means that all the ESP's you calculate
> exist in some fraction of the molecules in the cell.  It is possible
> that only the molecules with a particular conformation of this peptide
> have the ESP that allows the molecule to function.  Life is hard.
> 
>    Another issue that you must consider: If the exact conformation of
> this loop causes changes to the ESP that you consider significant to
> your unders

Re: [ccp4bb] Electrostatic Potential: Poisson-Boltzmann

2017-12-01 Thread Dale Tronrud

   These are not easy questions to answer.  Certainly atoms,
particularly ones that are charged, even with fractional charges, have a
strong effect on the ESP.  If you delete them because you don't know
exactly where they are you will get a different answer than if you put
them in in some reasonable but unsupported location (as you have found).
 This result indicates that the peptide does affect the ESP
significantly and you have to consider it.

   You could build lots of models with the peptide in different
conformations and average all the maps.  This misses the point.  You
have uncertainty in your model which means that you have uncertainty in
your electrostatic potential.  Any particular ESP that you calculate and
draw conclusions from will have a large uncertainty and you must
consider that uncertainty when deciding between your potential
conclusions.  (I'm not sure if the pun is intended or not!)

   I suppose you could believe that each possible conformation exists to
some extent in reality which means that all the ESP's you calculate
exist in some fraction of the molecules in the cell.  It is possible
that only the molecules with a particular conformation of this peptide
have the ESP that allows the molecule to function.  Life is hard.

   Another issue that you must consider: If the exact conformation of
this loop causes changes to the ESP that you consider significant to
your understanding of the function of this protein, the presence and
conformation of neighboring proteins and solvent will also cause
significant changes to the ESP.  The biological context of the protein
becomes important.  If your interpretation depends critically on the
value and distribution of ESP then I'm not sure you can work this out
based on calculated ESP, considering the large uncertainty.

Dale Tronrud

On 12/1/2017 10:02 AM, chemocev marker wrote:
> Hi
> 
> I am calculating the Electrostatic Potential of my protein. But there
> were few flexible region with high B-factor and I deleted that part of
> the protein and then recalculated it. But there I can see a big
> change.As I have a structure in the presence and the absence of the
> peptide and the these flexible regions have a better map in the
> structure without peptide and with peptide I have to delete them.
> I have a question, should I model these missing regions
> 
> or
> 
> I should ignor them
> 
> best
> 
> Jiri

Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-12 Thread Dale Tronrud

On 11/12/2017 6:48 AM, Kay Diederichs wrote:
> On Fri, 10 Nov 2017 14:04:26 -0800, Dale Tronrud <de...@daletronrud.com> 
> wrote:
> ...
>>
>>   My belief is that the fact that our spot intensities represent the
>> amplitude (squared) of a series of Sin waves is the result of the hard
>> work of people like Bob who give us monochromatic illumination.
>> "Monochromatic" simply means it is a pure Sin wave.  If Bob could get
>> that shiny new ring of his to produce an electromagnetic square wave his
>> users would still get diffraction patterns with spots but they would
>> have to come up with programs that would perform Fourier summations of
>> square waves to calculate electron density.  Our instrument is an analog
>> computer for calculating the Sin wave Fourier transform of the electron
>> density of our crystal because we designed it to do exactly that.
>>
>> Dale Tronrud
>>
> ...
> 
> Hi Dale,
> 
> Well, perhaps I understand you wrongly, but I'd say if Bob would succeed in 
> making his synchrotron produce "square" instead of sine waves then we would 
> not have to change our programs too much, because a "square wave" can be 
> viewed as (or decomposed into) superpositions of a sine wave of a given 
> frequency/energy with its higher harmonics, at known amplitude ratios.
> This would be similar in some way to a Laue experiment, but not using a 
> continuum of energies, only discrete ones. The higher harmonics would just 
> change the intensities a bit (e.g. the 1,2,3 reflection would get some 
> additional intensity from the 2,4,6 and 3,6,9 reflection), and that change 
> could to a large extent be accounted for computationally, like we currently 
> do in de-twinning with low alpha. 
> That would probably be done in data processing, and might not affect the 
> downstream steps like map calculation.

   What you are describing (which is absolutely correct) sounds like a
lot more programming work than writing a square-wave Fourier transform
program.

   All I'm doing is trying to answer the very intriguing question that
beginners ask, but us old-timers tend to forget - Why are the intensity
of the Bragg spots the square of the amplitude of SIN waves?  The answer
I'm proposing is that the illumination source is a Sin wave so the
diffraction spots are in reference to Sin waves.  If Bob could give us
square waves the spot intensity would be proportional to the square of
the square wave Fourier transform of the density.  If ALS could give us
triangular waves their spots would tell us about the triangular wave
Fourier transform.

   While you want to continue to live in the Sin-wave world despite
having square waves in your experiment, I could be perverse and do the
same from my world.  Your Sin waves can be expressed as a sum of the
harmonics of my square waves and I could say that the intensity of what
you call the 1,2,3 reflection contains information from what I would
call the 1,2,3 and 2,4,6 and 3,6,9 (and so on) reflections.  The
mathematics is general and not specific to Sin waves.  It just happens
that it is easier for Bob to provide us with Sin wave illumination and
so our analysis uses Sin waves.

   This is quite abstract, but in the free electron laser world the
pulses are getting so short that they can't make the plane-wave
approximation and have to analyze their images in terms of the wave
packet, with its inherent bandwidth and coherence between the individual
frequencies within the packet.  See, my Sin-wave bias is showing -
"bandwidth" and "frequencies" both come from an insistence on reducing
all problems to Sin waves.  Maybe the free electron people would do
better by following Ethan and thinking about wavelets...

Dale Tronrud

> 
> best,
> 
> Kay
>

Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-10 Thread Dale Tronrud

On 11/10/2017 1:38 PM, Robert Sweet wrote:
> This has been a fascinating thread. Thanks.
> 
> I will dip my oar in the water.  Here are a couple of snippets.
> 
>> Jacob: It was good of proto-crystallographers to invent diffraction as
>> a way to apply Fourier Series.
> 
> and
> 
>> Ethan: So here's the brain-teaser: Why does Nature use Fourier
>> transforms rather than Wavelet transforms? Or does she?
> 
> Probably Jacob was joking, but I believe we should say that physicists
> (and Ms. Nature) employ the Fourier transform/synthesis because this
> models pretty precisely the way that we believe light rays/waves of all
> energies interfere with one another.
> 
> Warm regards, Bob

   My belief is that the fact that our spot intensities represent the
amplitude (squared) of a series of Sin waves is the result of the hard
work of people like Bob who give us monochromatic illumination.
"Monochromatic" simply means it is a pure Sin wave.  If Bob could get
that shiny new ring of his to produce an electromagnetic square wave his
users would still get diffraction patterns with spots but they would
have to come up with programs that would perform Fourier summations of
square waves to calculate electron density.  Our instrument is an analog
computer for calculating the Sin wave Fourier transform of the electron
density of our crystal because we designed it to do exactly that.

Dale Tronrud


> 
> 
> On Fri, 10 Nov 2017, Keller, Jacob wrote:
> 
>>>> My understanding is that EM people will routinely switch to
>>>> diffraction mode when they want accurate measurements.  You lose the
>>>> phase information but, since EM lenses tend to have imperfections,
>>>> you get better measurements of the intensities.
>>
>> Only to my knowledge in the case of crystalline samples like 2D crystals.
>>
>>>> Of course the loss of phases is a serious problem when you don't
>>>> have a model of the object as precise as our atomic models.
>>
>> From where does this precision arise, I wonder? I guess priors for
>> atom-based models are pretty invariant. On the other hand, who says
>> that such priors, albeit of many more varieties, don't exist for
>> larger biological samples, such as zebrafish brains and drosophila
>> embryos/larvae? Anyway, right now, the state of the art of modelling
>> in these fluorescence data sets is hand-drawing circles around things
>> that look interesting, hoping the sample does not shift too much, or
>> perhaps using some tracking. But it could be so much better!
>>
>> JPK
>>
>>
>

Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-10 Thread Dale Tronrud

On 11/10/2017 7:55 AM, Keller, Jacob wrote:
> "Quality of image" has a lot of parameters, including resolution, noise, 
> systematic errors, etc. I am not aware of a global "quality of image" metric.
> 
> One other consideration, not related to your comment: imagine if we had an 
> x-ray lens through which we could take confocal images of a protein molecule 
> or crystal, output as a voxel array. Would we really still prefer to measure 
> diffraction patterns rather than the equivalent real space image, even 
> assuming we had some perfect way to solve the phase problem? Or conversely, 
> should we try to do fluorescence imaging in diffraction mode, due to its 
> purported information efficiency?

   It depends on the quality of your lens.  My understanding is that EM
people will routinely switch to diffraction mode when they want accurate
measurements.  You lose the phase information but, since EM lenses tend
to have imperfections, you get better measurements of the intensities.
Of course the loss of phases is a serious problem when you don't have a
model of the object as precise as our atomic models.

   The lens in a microscope tends to be of very high quality and you
don't have precise models of the object to calculate phases so there is
no advantage of going to "diffraction mode"..

Dale Tronrud

> 
> JPK
> 
> -Original Message-
> From: herman.schreu...@sanofi.com [mailto:herman.schreu...@sanofi.com] 
> Sent: Friday, November 10, 2017 10:22 AM
> To: Keller, Jacob <kell...@janelia.hhmi.org>; CCP4BB@JISCMAIL.AC.UK
> Subject: AW: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum
> 
> At the bottom line, it is the quality of the image, not only the amount of 
> pixels that counts. Adding more megapixels to a digital camera with a poor 
> lens (as some manufacturers did), did not result in any sharper or better 
> images.
> Herman
> 
> 
> -Ursprüngliche Nachricht-
> Von: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] Im Auftrag von 
> Keller, Jacob
> Gesendet: Freitag, 10. November 2017 15:48
> An: CCP4BB@JISCMAIL.AC.UK
> Betreff: [EXTERNAL] Re: [ccp4bb] AW: Re: [ccp4bb] Basic 
> Crystallography/Imaging Conundrum
> 
> It seems, then, to be generally agreed that the conversion between voxels and 
> Fourier terms was valid, each containing the same amount of information, but 
> the problem was in the representation, and there was just trickery of the 
> eye. I was thinking and hoping this would be so, since it allows a pretty 
> direct comparison of crystal data to microscopic imaging data. I guess a 
> litmus test would be to decide whether a voxel version of the electron 
> density map would work equivalently well in crystallographic software, which 
> I suspect it would. If so, then the same techniques--so effective in 
> extracting information for the relatively information-poor crystal 
> structures--could be used on fluorescence imaging data, which come in voxels.
> 
> Regarding information-wealth, in Dale's example, the whole hkl set was 4.1 
> MB. One frame in a garden-variety XYZT fluorescence image, however, contains 
> about 2000 x 2000 x 100 voxels at 16-bit, i.e., 400 million bits or 50 MB. In 
> some data sets, these frames come at 10 Hz or more. I suspect that the 
> I/sigma is also much better in the latter. So, with these data, and keeping a 
> data:parameters ratio of ~4, one could model about 100 million parameters. 
> This type of modelling, or any type of modelling for that matter, remains 
> almost completely absent in the imaging world, perhaps because the data size 
> is currently so unwieldy, perhaps also because sometimes people get nervous 
> about model biases, perhaps also because people are still improving the 
> imaging techniques. But just imagine what could be done with some 
> crystallography-style modelling!
> 
> Jacob Keller
> 
> 
> 
> -Original Message-
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tristan 
> Croll
> Sent: Friday, November 10, 2017 8:36 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum
> 
> Or a nice familiar 2D example: the Ramachandran plot with 7.5 degree binning, 
> as a grid (left) or with bicubic smoothing (right). Different visualisations 
> of the same data, but the right-hand image uses it better.
> 
> On 2017-11-10 08:24, herman.schreu...@sanofi.com wrote:
>> In line with Dale's suggestions, I would suggest that you reformat 
>> your voxel map into the format of an electron density map and look at 
>> it with coot. I am sure it will look much better and much more like 
>> the electron density we are used to look at. Alternatively, you c

Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-10 Thread Dale Tronrud

   A second observation of the same experimental quantity does not
double the amount of "information".  We know from the many discussions
on this forum that the improvement of multiplicity is diminishing with
repetition.

   Measuring "information content" is very hard.  You can't just count
the bytes and say that measures the information content.  My example of
an oversampled map proves the point - The file is much bigger but can be
calculated exactly from the same, relatively small, number of
reflections.  The ultimate extreme is a map calculated from just the
F000 term.  One number can produces a map with gigibytes of data - It
just happens that all the numbers are equal.

   While our Bragg spots are pretty much independent measurements, after
merging, Herman is right about microscopes.  The physical nature of the
instrument introduces relationships between the values of the voxels so
the information content is smaller, perhaps by a lot, than the number of
bytes in the image.  You have to have a deep understanding of the lens
system to work out what is going on.  And a second image of the same
instrument of the same object measured a mSec later will be very highly
correlated with the first and add very little new "information" to the
experiment.

   BTW while we write maps as a set of numbers arranged in the 3D array,
it is not equivalent to an image.  The pixels, or voxels in 3D, indicate
the average value of that region while our map files contain the value
of the density at a particular point.  Our numbers are very distinct,
while pixels can be quite confusing.  In many detectors the area
averaged over is somewhat larger than the spacing of the pixels giving
the illusion of greater detail w/o actually providing more information.
This occurs in our CCD detectors where the X-ray photons are converted
to a lower frequency light by some sort of phosphor and in a microscope
by a poor lens (also as mentioned by Herman).

   Measuring information content is hard, which is why it is usually not
considered a rigorous quantity.  The classic example is the value of
ratio of the circumference of a circle to its diameter.  This number has
an infinite number of digits which could be considered an infinite
amount of information.  I can simply type "Pi", however, and accurately
express that infinity of information.  Just how much information is present?

Dale Tronrud

On 11/10/2017 6:47 AM, Keller, Jacob wrote:
> It seems, then, to be generally agreed that the conversion between voxels and 
> Fourier terms was valid, each containing the same amount of information, but 
> the problem was in the representation, and there was just trickery of the 
> eye. I was thinking and hoping this would be so, since it allows a pretty 
> direct comparison of crystal data to microscopic imaging data. I guess a 
> litmus test would be to decide whether a voxel version of the electron 
> density map would work equivalently well in crystallographic software, which 
> I suspect it would. If so, then the same techniques--so effective in 
> extracting information for the relatively information-poor crystal 
> structures--could be used on fluorescence imaging data, which come in voxels.
> 
> Regarding information-wealth, in Dale's example, the whole hkl set was 4.1 
> MB. One frame in a garden-variety XYZT fluorescence image, however, contains 
> about 2000 x 2000 x 100 voxels at 16-bit, i.e., 400 million bits or 50 MB. In 
> some data sets, these frames come at 10 Hz or more. I suspect that the 
> I/sigma is also much better in the latter. So, with these data, and keeping a 
> data:parameters ratio of ~4, one could model about 100 million parameters. 
> This type of modelling, or any type of modelling for that matter, remains 
> almost completely absent in the imaging world, perhaps because the data size 
> is currently so unwieldy, perhaps also because sometimes people get nervous 
> about model biases, perhaps also because people are still improving the 
> imaging techniques. But just imagine what could be done with some 
> crystallography-style modelling!
> 
> Jacob Keller
> 
> 
> 
> -Original Message-
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Tristan 
> Croll
> Sent: Friday, November 10, 2017 8:36 AM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] AW: Re: [ccp4bb] Basic Crystallography/Imaging Conundrum
> 
> Or a nice familiar 2D example: the Ramachandran plot with 7.5 degree binning, 
> as a grid (left) or with bicubic smoothing (right). Different visualisations 
> of the same data, but the right-hand image uses it better.
> 
> On 2017-11-10 08:24, herman.schreu...@sanofi.com wrote:
>> In line with Dale's suggestions, I would suggest that you reformat 
>> your voxel map into the format of an electron density map and l

Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-09 Thread Dale Tronrud

mpling will depend on what you are
doing to the map (Again, see Gerard's papers.)  Such an oversampled map
will have many more voxels but no more information because the density
values are correlated.

Dale Tronrud

On 11/9/2017 4:10 PM, Keller, Jacob wrote:
> Dear Crystallographers,
> 
>  
> 
> I have been considering a thought-experiment of sorts for a while, and
> wonder what you will think about it:
> 
>  
> 
> Consider a diffraction data set which contains 62,500 unique reflections
> from a 50 x 50 x 50 Angstrom unit cell, with each intensity measured
> perfectly with 16-bit depth. (I am not sure what resolution this
> corresponds to, but it would be quite high even in p1, I think--probably
> beyond 1.0 Angstrom?). Thus, there are 62,500 x 16 bits (125 KB) of
> information in this alone, and there is an HKL index associated with
> each intensity, so that I suppose contains information as well. One
> could throw in phases at 16-bit as well, and get a total of 250 KB for
> this dataset.
> 
>  
> 
> Now consider an parallel (equivalent?) data set, but this time instead
> of reflection intensities you have a real space voxel map of the same 50
> x 50 x 50 unit cell consisting of 125,000 voxels, each of which has a
> 16-bit electron density value, and an associated xyz index analogous to
> the hkl above. That makes a total of 250 KB, with each voxel a 1
> Angstrom cube. It seems to me this level of graininess would be really
> hard to interpret, especially for a static picture of a protein
> structure. (see attached: top is a ~1 Ang/pixel down-sampled version of
> the image below).
> 
>  
> 
> Or, if we wanted smaller voxels still, let’s say by half, we would have
> to reduce the bit depth to 2 bits. But this would still only yield
> half-Angstrom voxels, each with only four possible electron density values.
> 
>  
> 
> Is this comparison apt? Off the cuff, I cannot see how a 50 x 50 pixel
> image corresponds at all to the way our maps look, especially at around
> 1 Ang resolution. Please, if you can shoot down the analogy, do.
> 
>  
> 
> Assuming that it is apt, however: is this a possible way to see the
> power of all of our Bayesian modelling? Could one use our modelling
> tools on such a grainy picture and arrive at similar results?
> 
>  
> 
> Are our data sets really this poor in information, and we just model the
> heck out of them, as perhaps evidenced by our scarily low
> data:parameters ratios?
> 
>  
> 
> My underlying motivation in this thought experiment is to illustrate the
> richness in information (and poorness of modelling) that one achieves in
> fluorescence microscopic imaging. If crystallography is any measure of
> the power of modelling, one could really go to town on some of these
> terabyte 5D functional data sets we see around here at Janelia (and on
> YouTube).
> 
>  
> 
> What do you think?
> 
>  
> 
> Jacob Keller
> 
>  
> 
> +
> 
> Jacob Pearson Keller
> 
> Research Scientist / Looger Lab
> 
> HHMI Janelia Research Campus
> 
> 19700 Helix Dr, Ashburn, VA 20147
> 
> (571)209-4000 x3159
> 
> +
> 
>  
>

Re: [ccp4bb] double cell dimensions between P2 and C2

2017-11-09 Thread Dale Tronrud

   I agree with Phil.  A P2 crystal with nearly perfect
noncrystallographic translational symmetry (~1/2,~1/2,0) will look like
a C2 cell with twice the length along a and b and weak spots between the
indexed spots.  Look for those spots on your "C2" images.

Dale Tronrud

On 11/9/2017 3:06 AM, Phil Evans wrote:
> You should look critically at the indexing of the images for both cases. Does 
> the lattice interpret all spots, or are half of them missing
> 
> 
>> On 9 Nov 2017, at 10:02, Markus Heckmann <markus.21...@gmail.com> wrote:
>>
>> Dear all,
>>> From a small protein, gives crystals P2 with cell
>> Cell 53.16   65.73   72.8990  110.94  90
>> (has 3 molecules in the asymmetric unit). Tested with pointless. Does
>> not give any other possibility.
>>
>> Another crystal if the same protein, similar conditions:
>> C2
>> Cell 109.14  124.37   73.4290  111.75  90. This has 6
>> molecules in the a.s.u. Tested with pointless. Does not give any other
>> possibility.
>> The cell length a, b of C2 is twice that of P2.
>>
>> Is it usual to get such crystals from similar conditions or am I
>> missing something?
>>
>> Many thanks,
>> Mark
>

Re: [ccp4bb] AW: Another troublesome dataset (High Rfree after MR)

2017-10-16 Thread Dale Tronrud

   Discarding weak data was not the way macromolecular refinement was
done prior to 1990.  Discarding data to lower your R-value is a bad
practice now and was a bad practice back then.  It is my recollection
that some people using X-plor adopted this practice, along with
discarding all low resolution data, but outside of that community these
methods were frowned upon.

   I agree that looking at the agreement between model and data for
subsets of your data is a useful tool for identifying pathologies, but
discarding data in refinement simply because they disagree with your
model is deception.  I know that James is not recommending this, but
that is what some people in that bad period in the 1990's were doing.
Most of us were not!

Dale Tronrud

On 10/16/2017 8:02 AM, James Holton wrote:
> 
> If you suspect that weak data (such as all the spot-free hkls beyond
> your anisotropic resoluiton limits) are driving up your Rwork/Rfree,
> then a good sanity check is to compute "R1".  Most macromolecular
> crystallographers don't know what "R1" is, but it is not only
> commonplace but required in small-molecule crystallography.  All you do
> is throw out all the weak data, say everything with I/sigma < 2 or 3,
> and then re-compute your R factors.  That is, use something like
> "sftools" to select only clearly "observed" reflections, and feed that
> data file back into your refinement program.  In fact, refining only
> against data with I/sigma>3 is the way macromolecular refinement was
> done up until about 1990.  These days, for clarity, you may want to call
> the resulting Rwork/Rfree as R1work and R1free.
> 
> If you do this, and your R1work/R1free are still just as bad as
> Rwork/Rfree, then weak data are not your problem.  You'd be surprised
> how often this is the case.  Next on the list are things like wrong
> symmetry choice, such as twinning masquerading as a symmetry operator,
> or disorder, as in large regions of the molecule that are too fluttery
> to peak above 1 sigma.  The list goes on, but doing the weak-data
> rejection test really helps narrow it down.
> 
> -James Holton
> MAD Scientist
> 
> 
> On 10/16/2017 3:55 AM, herman.schreu...@sanofi.com wrote:
>>
>> Dear Michael,
>>
>>  
>>
>> Did you ask Phaser to check for all possible space groups? There are
>> still I422 and I4 you did not mention. If the space group that came
>> out of Phaser is different from the space group used for processing,
>> subsequent refinement programs may use the wrong space group from the
>> processing. This should be easy to check.
>>
>>  
>>
>> The other suggestion I have is to try a different processing program.
>> Although XDS is excellent, I find that sometimes it has difficulties
>> with ice rings, which reveal themselves not in the processing, but in
>> the subsequent refinement. You may want to try Mosflm or some other
>> processing program.
>>
>>  
>>
>> Best,
>>
>> Herman
>>
>>  
>>
>> *Von:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] *Im Auftrag
>> von *Michael Jarva
>> *Gesendet:* Sonntag, 15. Oktober 2017 03:09
>> *An:* CCP4BB@JISCMAIL.AC.UK
>> *Betreff:* [EXTERNAL] [ccp4bb] Another troublesome dataset (High Rfree
>> after MR)
>>
>>  
>>
>> To add to the current anisotropic discussion I recently got a dataset
>> I’m unable to refine and I’m hoping I could get some help on figuring
>> out if there’s anything I can do.
>>
>>  
>>
>> I get a clear cut solution with Phaser using the same protein as
>> search model and got a TFZ of >16, LLG >200, and a packing that makes
>> sense, so I don’t doubt the solution. However, the maps look terrible,
>> more like something I would expect from a 3.65Å dataset rather than
>> the 2.65Å it supposedly is.
>>
>>  
>>
>> The dataset merges well in I4122 to 2.65Å with an overall Rmerge of 5%
>> and a CC1/2 of >0.5 in the outer shell (see the bottom for full
>> summary). There is some minor radiation damage but I could cut out
>> most of it due to the high symmetry.
>>
>>  
>>
>> Xtriage reports no indication of twinning, but does say that the data
>> is moderately anisotropic, so I ran the unmerged data through the
>> StarAniso server, which reported the ellipsoidal resolution limits to
>> be 2.304, 2.893, and 3.039. Refining with the anisotropically
>> truncated data improves the maps somewhat, but I am still unable to
>> get the Rfree below 38%. I tried using both phenix.refine and buster
>> with similar results.
>>
>>  
>>
>>

Re: [ccp4bb] Short peptide outliers

2017-10-01 Thread Dale Tronrud

Hi,

   Bond length and angle targets are defined based on the local
chemistry and apply equally to small and large molecules.  The
Ramachandran distributions were defined via an examination of,
basically, tripeptides.  Your peptide model must be consistent with
these prior observations to be considered reliable.  If it is not there
is likely something seriously wrong with your interpretation.

   In addition, your model peptide must make chemically reasonable
interactions with its partner.  You didn't describe this aspect of your
model, but this is equally critical in the evaluation of the model of a
bound ligand.

   In my opinion the most likely explanation is that multiple
conformations of the peptide are binding.  Without seeing the density or
being able to examine the data it is hard to generate possibilities.

Dale Tronrud

On 10/1/2017 2:20 AM, Meytal Galilee wrote:
> Hi All,
> I have solved a structure of a protein bound to a short peptide (11
> residues) at 1.9A.
> The peptide fits the map perfectly, however,  all of its residues are
> either Ramachandran / bond length / angle outliers. 
> Fixing any of these issues forces the peptide to misfit the map
> dramatically. 
> Is anyone familiar with short peptides outliers? Are these issues common
> / acceptable?
> Does anyone have an idea or suggestion? 
> Many Thanks,
> Meytal Galilee
>

Re: [ccp4bb] Refienmnet

2017-08-15 Thread Dale Tronrud

   Your first step is to look at your images and see what is going on in
that shell.  Since you are looking at merged stats the first guess is
that there is something wrong with all of the images, but only by
looking at them can you tell.

Dale Tronrud

On 8/15/2017 9:39 AM, rohit kumar wrote:
> Dear All,
> 
>  
> 
> I am refining a data in refmac5 and the data resolution is 1.8 A. Data
> is looking fine with the data statics see below for data statics
> 
>  
> 
> Inline image 1
> 
>  
> 
> Right now the R/Rfree is 22/26 not good for such resolution. I have
> tried most of the options in refmec5 but still I am not able to lower
> these R/Rfree values. 
> 
> During refinement with phenix what I found that at resolution 1.93 to
> 1.86 R/Rfree is quite high see below 
> 
> 
> Inline image 2
> 
> 
> 
> 
> Is this the possible reason for high R/Rfree value?. If it is  please
> let me know how some can remove these frames during refinement. or
> 
> 
> Please tell me other strategies to lower down the R/Rfree values.  
> 
> 
> Thank you in advance
> 
> 
> With Regards
> 
> Dr. Rohit Kumar Singh
> 
>

Re: [ccp4bb] Incorrect Structure in the PDB

2017-06-27 Thread Dale Tronrud

   It can't hurt to try to work with the author, and it's the only way
to get the current PDB file out of the database.  If you don't get the
response you want you can still go ahead and look for a venue to publish
your own interpretation.

   Having a paper to go with the model is not only a requirement of the
wwPDB rules but it makes the situation much clearer the someone wanting
to understand this protein.  Having a second model in the PDB w/o a
publication would not leave many clues to decide which model to use.

Dale Tronrud

On 6/27/2017 12:15 AM, Trevor Sewell wrote:
> The misinterpretation is considerable I as can be seen from the attached
> coot screenshot.
> 
>  
> 
> I have no reason to suspect malfeasance. But it looks like the authors
> didn’t check very carefully.
> 
> I have re-interpreted and refined the density and it is just fine –
> Rfactor of 18% for a 2.3A structure.
> 
>  
> 
> The critical reinterpretations concern  the orientation of the backbone
> near the active site and the interpretation of a blob of density claimed
> to be substrate in the original paper.
> 
>  
> 
> Maybe the best would be to write to the author and suggest that she
> obsolete the structure. We could see if we could  reach some agreement
> on how to take it further – perhaps a letter to the editor of JBC.
> 
>  
> 
>  
> 
>  
> 
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
> 
>  
> 
> *From: *Manfred S. Weiss <mailto:manfred.we...@helmholtz-berlin.de>
> *Sent: *Tuesday, June 27, 2017 8:46 AM
> *To: *Trevor Sewell <mailto:trevor.sew...@uct.ac.za>
> *Subject: *Re: [ccp4bb] Incorrect Structure in the PDB
> 
>  
> 
> Dear Trevor,
> 
> you can download the incorrect structure and the associated data and
> reinterpret and
> re-refine the structure. Then you can re-deposit provided you write a
> paper about the
> new findings. This is currently the policy of the PDB.
> 
> Else, you can contact the authors of the incorrect structure and do the
> reinterpretation
> together with them? They can replace the incorrect structure without a
> new publication.
> 
> That's all there is at the moment.
> 
> May I ask what is incorrect about the structure?
> 
> Cheers,
> 
> Manfred
> 
> Am 27.06.2017 um 08:34 schrieb Trevor Sewell:
>>
>>  
>>
>> I have come across a key paper in my field that describes an enzyme
>> mechanism. Their work is based on a deposited structure – by other
>> authors - that is incorrectly interpreted.
>>
>>  
>>
>> Is there a process for removing a demonstrably wrong structure
>> (deposited by others) from the PDB and replacing it with a correctly
>> interpreted structure based on the original data? Or is there an
>> alternative, and generally recognized, way of getting the correct
>> structure in the public domain?
>>
>>  
>>
>> Many thanks for your advice on this matter.
>>
>>  
>>
>> Trevor Sewell
>>
>>  
>>
>> Disclaimer - University of Cape Town This e-mail is subject to UCT
>> policies and e-mail disclaimer published on our website at
>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable
>> from +27 21 650 9111. If this e-mail is not related to the business of
>> UCT, it is sent by the sender in an individual capacity. Please report
>> security incidents or abuse via cs...@uct.ac.za 
> 
> -- 
> Dr. Manfred S. Weiss
> Macromolecular Crystallography
> Helmholtz-Zentrum Berlin
> Albert-Einstein-Str. 15
> D-12489 Berlin
> Germany
> 
> 
> 
> 
> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
> 
> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
> Forschungszentren e.V.
> 
> Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende
> Dr. Jutta Koch-Unterseher
> Geschäftsführung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking
> 
> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
> 
> Postadresse:
> Hahn-Meitner-Platz 1
> D-14109 Berlin
> 
> http://www.helmholtz-berlin.de
> <https://protect-za.mimecast.com/s/ndaRBlf2pwDnU4>
> Disclaimer - University of Cape Town This e-mail is subject to UCT
> policies and e-mail disclaimer published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from
> +27 21 650 9111. If this e-mail is not related to the business of UCT,
> it is sent by the sender in an individual capacity. Please report
> security incidents or abuse via cs...@uct.ac.za

Re: [ccp4bb] Refining a crystal structure with (very) high solvent content

2017-06-02 Thread Dale Tronrud

On 6/2/2017 1:42 PM, wtempel wrote:
> Hello all,
> crystals with high solvent content tend to diffract poorly, at least
> according to intuition. Several years ago we solved a structure
> <http://www.rcsb.org/pdb/explore/explore.do?structureId=2h58> that
> appeared to buck that trend with a solvent content of ≈0.8 and
> resolution beyond 2 Å, per merging statistics and visibility of spots on
> diffraction images.
> I would welcome my colleagues’ opinions as to why I might observe the
> following:
> 
>  1. Paired refinement (similar to Fig. 1 in Karplus
> <http://doi.org10.1126/science.1218231>) indicates that adding any
> higher resolution data beyond 3.4 Å, the lowest high resolution
> cut-off limit I tried, does not improve R-factors at the common
> lower resolution cutoff. Yes, diffraction is anisotropic in this
> case, but seemingly not to that extent. I hesitate to “throw out”
> all data beyond 3.4 Å, or whatever lower resolution cut-off I might try.
>  2. The Fo-Fc map, when countoured at ± 3 rmsd, includes many more
> (uninterpretable) features than I would expect after refinement to
> residuals in the mid-to-lower twenties. For expected map appearance,
> I had to crank up the coutour level to > 5 rmsd, like in the
> attached screenshot of the ADP·Mg^++ omit map.

   This is one of the prime examples of the failure of describing
contour levels in terms of "sigma".  First, the number you are using is
not a "standard deviation" or any other measure of the error level of
the map but is simply the rms value of the map.  If you calculate the
rms of a difference map where 80% of the unit cell is bulk solvent, and
therefore flat, you will, of course, get a much smaller number than if
the unit cell contained 80% protein with all the the expected difference
map features that come from a model with an R value of ~20%.  Then when
you contour at three times this absurdly small number you will see all
sorts of features you are not used to seeing.  Selecting a contour level
based on the e/A^3 is much less sensitive to the amount of solvent in
the crystal is gives much more consistent results.

Dale Tronrud
> 
> Could these observations be linked to the high solvent content? (1) A
> high solvent content structure has a higher-than-average
> observation-to-parameter ratio, sufficiently high even when limited to
> stronger, low-resolution reflections? (2) Map normalization may not be
> attuned to such high solvent content?
> I am interested in analyzing the automated decision-making of the
> PDB-REDO of this entry <http://www.cmbi.ru.nl/pdb_redo/h5/2h58>, such as
> paired refinement results and selection of ADP model. Should I find this
> information in the “All files (compressed)” archive
> <http://www.cmbi.ru.nl/pdb_redo/cgi-bin/zipper.pl?id=2h58>? The “fully
> optimized structure’
> <http://www.cmbi.ru.nl/pdb_redo/h5/2h58/2h58_final.pdb> shows |ANISOU|
> cards and |NUMBER OF TLS GROUPS : NULL|. Does this mean that individual
> ADPs have been refined anisotropically?
> Looking forward to your insights,
> Wolfram Tempel
> 
>

[ccp4bb]

2017-05-18 Thread Dale Tronrud

   I'm sorry but I'm a little confused by your question.  If your map
already has four-fold symmetry why can't you simply build your model
once in one quarter of the map?  What do you hope to change by
specifying that the space group is P4?

Dale Tronrud

On 5/18/2017 10:06 PM, Qingfeng Chen wrote:
> Hi, 
> 
> I have an EM map of a tetrameric protein. It was painful to work with
> this map since it is in P1 spacegroup, although 4-fold symmetry was
> already applied during map reconstruction. 
> 
> I noticed that people used MAPMAN to transform spacegroup, however, it
> seems not working for me. The map remained in P1 spacegroup afterwards. 
> 
> I used mtz file converted from .mrc and the tetrameric protein model as
> input and choose "run fft to generate simple map". I also specified
> "output map in ccp4 format to cover all atoms in pdb". In "infrequently
> used options", I input P4 in "generate map in spacegroup". Everything
> else was left as default. 
> 
> Any suggestions will be appreciated. 
> 
> Thanks!

Re: [ccp4bb] NAD dihedral for C2N-C3N-C7N-N7N

2017-05-18 Thread Dale Tronrud

   I have looked over a number of high resolution models with NAD+ and
NADH in the PDB as well as small molecule structures.  I also have some
familiarity with similar chemistry in the decorations on the edge of
bacteriochlorophyll-a molecules.  The CONH2 group does flip over when
the hydrogen bonding environment calls for it.  It is very hard to tell
the difference between the oxygen atom and the nitrogen atom from the
appearance of the electron density so you always have to check the
hydrogen bonding environment when building an NAD? model.

   I have seen one case where a Ser -> Ala mutation in the protein
caused the group to flip with interesting consequences on the far side
of the co-factor.  My go-to QM person tells me that flipping this group
will change the energies of the molecular orbitals and therefor the
redox potential of the NAD? molecule so this conformational change may
be important to the action of your catalysis.

   I have also seen a number of NAD? models in the PDB where this group
is clearly misorientated.

   As you note, the torsion angle should be close to zero or 180.
However it is unlikely to have exactly those values because there are
non-bonded clashes when everything is in one plane.  Some restraint
libraries inappropriately restrain this group to be co-planar with the
six-membered ring.  As always, check you CIF!

Dale Tronrud

On 5/17/2017 12:46 PM, Jorge Iulek wrote:
> Dear all,
> 
> I came across some difficulty to refine a NAD molecule in a
> structure, specially its amide of the nicotinamide moiety.
> A (very) brief search in deposited structures seems to point that
> not so ever the C2N-C3N-C7N-N7N dihedral is close to either 0 or 180
> degrees, but in most cases it is to one of these, with a preference
> towards 0 degrees. Another search in the literature, and I could not
> find any study on either NAD or even the nicotinamide alone to calculate
> the energy barrier to rotate around this bond (in vacuum, eg).
> My data quality and resolution do not put much confidence on
> B-factor differences, but they seem to indicate that the cited dihedral
> angle should be close to 180 degrees, id est, O7N is "closer" to C2N
> (and, consequently, to N1N) than N7N is. In fact, I have a glutamine
> nearby whose terminal amide is interacting with the nicotinamide amide,
> so my idea is to make one's nitrogen to interact with other's oxygen.
> Concerning b-factor differences for this glutamine, they favor its NE2
> to point to nicotinamide amide, what would imply that the
> C2N-C3N-C7N-N7N dihedral to would be close to 180 degrees rather than 0
> degree.
> Is there any wide study on NAD nicotinamide amide conformation?
> Specially, bound to protein structures?
> Thanks,
> 
> Jorge
>

Re: [ccp4bb] on the resoution of crystal

2017-02-05 Thread Dale Tronrud

   It is confusing, but "high" is meant to indicate the quality of the
final electron density map based on the data.  Your 1.8 A data set will
give the better map, and is the high resolution data set.

Dale Tronrud

On 2/5/2017 4:09 AM,  wrote:
> Dear All,
> 
> For one protein crystal, its resolution was 1.8 A. For another crystal
> for the same protein,  its resolution was 3.8 A. In literature, do we
> call the 1.8 A crystal as the high resolution crystal (because of
> quality), or do we call the 3.8 A crystal as the high resolution crystal
> (because of 3.8 was larger than 1,8)?
> 
> Best regards.
> 
> John

Re: [ccp4bb] Rfactor and Rfree not coming below 0.4

2017-01-20 Thread Dale Tronrud

   The R value is basically a sum of all the Fourier coefficients of the
difference map relative to their Fobs amplitude.  If your R value is in
the 40% range you MUST have a difference map with a lot of stuff in it.
The only way you can have a clean difference map with an R value so
large is if there is a mistake either in the calculation of the R value
or the map.

   I think you need to double-check your work.

Dale Tronrud

P.S. Are you contouring your difference map at a reasonable level?
Think e/A^3 not "sigma"s.  When I look at a difference map I start at a
level of 0.18 e/A^3 and then think about other levels.  There is nothing
magical about that number but looking at all maps the same way helps you
lean what a map should look like.  If you want an internal calibration,
you can leave out one, well ordered, water molecule.  Then you can see
what the difference density due to an absent atom looks like in YOUR
map, and compare that to the other features you see.

   Remember, if you have stuff EVERYWHERE in your difference map, you
will see NOTHING at a 3 rmsd contour because the rmsd of the map is huge!

On 1/20/2017 10:45 PM, ameya benz wrote:
> I am trying to refine my data set collected at 1.9A. The density appears
> clean and the fit is also good. In coot, all residues are in green zone
> in density fit analysis. Also in Ramachandran plot 89% residues are in
> allowed region. Few residues in loop region do not have good density for
> side chains. My protein contains about 61% loops and remaining beta
> sheets , no alpha helices (predicted from homology model). What could be
> done to improve the Rfree and Rfactor?

Re: [ccp4bb] Calculation of RSRZ Score in PDB Validation Reports

2016-11-28 Thread Dale Tronrud

On 11/28/2016 12:52 PM, esse...@helix.nih.gov wrote:
>> I found that one can get RSRZ to go way down by loosening the geometry
>> restraints.  The result is a crappy structure and I don't recommend doing
>> that, but it does get all the atoms crammed into some sort of density.
> 
>   Your observation is quite interesting. I can add this: when we were working
> with low to medium resolution structures, deleting the hydrogen atoms from
> the model after refinement moved the very bad RSRZ statistic to about the
> average in the given resolution range! Note, no re-refinement was done just
> a simple deletion of the riding H-atoms. I find this to be odd given the
> fact that, say the phenix developers favor the inclusion of H-atoms on
> riding positions even in cases of low resolution structures. (I assume the
> refmac5 and BUSTER-TNT developers have also a favorable opinion about
> including H-atoms in the final model - and during refinement).
> 
> In my mind, it may be tempting to delete H-atoms to improve this statistic but
> when you use them in refinement they should be included regardless of the
> outcome of the RSRZ analysis.

   Of course, if you trick a validation statistic like this you haven't
accomplished anything.  All you are saying is that one should rank RSRZ
scores with and without hydrogen atoms separately.  Perhaps you should
suggest that to the PDB validation people.

Dale Tronrud
> 
>>
>> RSRZ, in my most humble of opinions, seems like one of those statistics that
>> is far more useful in theory than reality.   Particularly for
>> medium-resolution structures, the fit of each entire side chain to the 
>> density
>> is likely to be imperfect because the density is imperfect, especially toward
>> the tips of those side chains.
>>
>> Then again, it can be a good flag for bits of the structure worth a second
>> look in rebuilding.
> 
>   The latter is certainly true. It may mean that the developers of RSRZ
> analysis need to tune it a bit to make it fully useful.
> 
> L.
> 
>>
>> 
>> From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Matthew
>> Bratkowski [mab...@cornell.edu]
>> Sent: Tuesday, November 22, 2016 10:12 AM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: [ccp4bb] Calculation of RSRZ Score in PDB Validation Reports
>>
>> Hello all,
>>
>> I was wondering if anyone knew how the RSRZ score was calculated in the
>> protein data bank validation reports and how useful of a metric this actually
>> is for structure validation?  I am trying to improve this score on a 
>> structure
>> that I am working on, but I'm not really sure where to begin.  From my
>> understanding, the score is based on the number of RSRZ outliers with a score
>>> 2.  In my case, I have several residues with scores between 2 and 4, but at
>> least by eye, fit to the electron density does not look that bad.  Hence, I
>> can't justify deleting them to try to improve the score.  If the score is 
>> just
>> based on percent of outlier residues, then for instance wouldn't a structure
>> with say 20 residues modeled with no corresponding electron density have the
>> same score as a structure with 20 residues with RSRZ values of say 2.5?
>>
>>
>> I was also wondering how the resolution of the structure relates to the 
>> score?
>>  Glancing through several pdb validation reports, I noticed some structure
>> with low resolution (3.5 A or lower) with relatively high scores, while 
>> others
>> with high resolution (2 A or higher) getting low scores.  It is reasonable to
>> assume that a structure of lower than 3.5 A would be missing several side
>> chains and may also have some ambiguous main chain electron density, which
>> should in theory increase the RSRZ score.  While of course every structure is
>> different and the quality of it due to the rigor of the person building the
>> model, I was wondering if there were any general trends related to resolution
>> and RSRZ score.
>>
>> Thanks,
>> Matt
>>
>

Re: [ccp4bb] High B factor

2016-10-14 Thread Dale Tronrud


   I agree that it would certainly be bad if your model was inconsistent
with the Wilson B.  When I first ran into people who were worried about
this I recommended that they calculate the Wilson B of their model, not
look at the average B as a proxy.  You can calculate structure factors
from your model and feed those into a Wilson B calculator and get the
proper value.

   When you do this you find pretty quickly that a refinement program
has to be screwed up pretty badly for it to create a model with even a
small difference between the observed and calculated Wilson B.  If your
model is off only a little the R value shoots up!  Other than when
debugging refinement code, I have never seen a refined model where there
was any significant difference between the calculated and observed
Wilson B.  This problem simply doesn't come up.

   Try it for yourself and see.

   When you see a difference between the Wilson B and the average of the
B's in your model, all you are seeing is that you have a variety of B
factors and that is not a surprise.  I would be worried if all my B
factors were equal!

   High B factors are not a problem to be fixed!  Some atoms do move
more than others.  Why would you expect otherwise?

Dale Tronrud

On 10/14/2016 12:28 AM, Carlos CONTRERAS MARTEL wrote:
> Sunanda,
> 
> As "common people", ... "agreement" don't means for me "equals" ...
> 
> So I hope I'm not so "incorrect" if I keep one mind the Wilson B as the
> model
> 
> refinement goes and the average B factor changes with different strategies.
> 
> If you are still worry about your Bfactor, you could try TLS, using the
> right approach could
> 
> improve your results.
> 
> All the best
> 
> Carlos
> 
> 
> 
> On 10/13/16 18:42, Dale Tronrud wrote:
>>It seems to be common for people to make the incorrect assumption
>> that the average of the atomic B factors in the PDB file should equal
>> the Wilson B.  The Wilson B is actually a weighted average of the
>> individual B factors where the weighting is rather mysterious, but the
>> small B factors have much higher weight than the larger ones. While the
>> Wilson B is simply a property of your data set the average of the atomic
>> B factors will also depend on choices you have made in building your
>> model.
>>
>>In your case the core of your protein does have B factors that match
>> your Wilson B.  The fact that you have loops that have larger B's
>> doesn't really mean there is a problem, because those atoms don't
>> contribute much to the higher resolution reflections and don't come into
>> the calculation of the Wilson B.
>>
>>The average of your B factors will increase if you choose to build
>> model for more disordered regions, while the Wilson B will, of course,
>> remain the same.  The average B factor will be larger, for example, if
>> you are more aggressive in building water molecules.  This does not
>> indicate a problem but is an unavoidable consequence of your choice to
>> take more risks.
>>
>> Dale
>>
>>
>>
>> On 10/13/2016 12:16 AM, sunanda williams wrote:
>>> Sorry for not making the problem clear!
>>> The overall B factor for the 3.0 A structure is around 98.00 A*2. Most
>>> of the deposited structures in the PDB site around this resolution
>>> has an av B around 70 A*2
>>> All other statistics looks fine. I got a warning message while trying to
>>> upload the structure in PDB that 98 was higher than the norm!
>>> The reason why the structure has such a high B factor could be due to
>>> disordered loops... All the same I was wondering how acceptable this
>>> structure would be to picky reviewers..
>>> And 'better the B factors' means bring them down...sorry couldn't phrase
>>> it 'better'! I am going to try doing refinement with group B factors!
>>> Would that help?
>>> Thanks Robbie, I think this I need to make the best of the model I have.
>>> Prof. Sekar will try PDB_redo..Thanks!
>>>
>>>
>>>
>>> On Thu, Oct 13, 2016 at 11:41 AM, Pavel Afonine <pafon...@gmail.com
>>> <mailto:pafon...@gmail.com>> wrote:
>>>
>>> I fully agree with Dale in not understanding what the problem is.
>>> Perhaps I have a better chance if you clearly explain what exactly
>>> you mean by "is there any way to better the B factors".
>>> Pavel
>>>
>>>
>>> On Thu, Oct 13, 2016 at 12:57 PM, Dale Tronrud
>>> <de...@daletronrud.com <mailto:de...@daletronrud.com>> wrote:
>>>
>>>

Re: [ccp4bb] High B factor

2016-10-13 Thread Dale Tronrud

   It seems to be common for people to make the incorrect assumption
that the average of the atomic B factors in the PDB file should equal
the Wilson B.  The Wilson B is actually a weighted average of the
individual B factors where the weighting is rather mysterious, but the
small B factors have much higher weight than the larger ones. While the
Wilson B is simply a property of your data set the average of the atomic
B factors will also depend on choices you have made in building your
model.

   In your case the core of your protein does have B factors that match
your Wilson B.  The fact that you have loops that have larger B's
doesn't really mean there is a problem, because those atoms don't
contribute much to the higher resolution reflections and don't come into
the calculation of the Wilson B.

   The average of your B factors will increase if you choose to build
model for more disordered regions, while the Wilson B will, of course,
remain the same.  The average B factor will be larger, for example, if
you are more aggressive in building water molecules.  This does not
indicate a problem but is an unavoidable consequence of your choice to
take more risks.

Dale

On 10/13/2016 12:16 AM, sunanda williams wrote:
> Sorry for not making the problem clear!
> The overall B factor for the 3.0 A structure is around 98.00 A*2. Most
> of the deposited structures in the PDB site around this resolution
> has an av B around 70 A*2
> All other statistics looks fine. I got a warning message while trying to
> upload the structure in PDB that 98 was higher than the norm!
> The reason why the structure has such a high B factor could be due to
> disordered loops... All the same I was wondering how acceptable this
> structure would be to picky reviewers..
> And 'better the B factors' means bring them down...sorry couldn't phrase
> it 'better'! I am going to try doing refinement with group B factors!
> Would that help?
> Thanks Robbie, I think this I need to make the best of the model I have.
> Prof. Sekar will try PDB_redo..Thanks!
> 
> 
> 
> On Thu, Oct 13, 2016 at 11:41 AM, Pavel Afonine <pafon...@gmail.com
> <mailto:pafon...@gmail.com>> wrote:
> 
> I fully agree with Dale in not understanding what the problem is.
> Perhaps I have a better chance if you clearly explain what exactly
> you mean by "is there any way to better the B factors".
> Pavel
> 
> 
> On Thu, Oct 13, 2016 at 12:57 PM, Dale Tronrud
> <de...@daletronrud.com <mailto:de...@daletronrud.com>> wrote:
> 
>I'm sorry but I don't understand what your problem is.  Do
> you think
> the B factors are too small for a 3A data set?  A range of 70 to
> 75 is a
> little smaller than usual but probably not out of bounds.
> 
> Dale Tronrud
> 
> On 10/12/2016 7:59 PM, sunanda williams wrote:
> > Hi all,
> > I have a structure at 3.0 A and R/Rfree of 24/28. The mean B
> value is
> > around 98. The B value is especially high at the N terminus
> and two loop
> > regions (around 120-150 AA).
> > The rest of the structure averaged around 70-75. Has anyone
> faced such a
> > scenario? How reliable is the structure and is there any way
> to better
> > the B factors.
> > Any help is appreciated.
> > Thank you!!
> 
> 
> 

signature.asc
Description: OpenPGP digital signature

Re: [ccp4bb] High B factor

2016-10-12 Thread Dale Tronrud

   I'm sorry but I don't understand what your problem is.  Do you think
the B factors are too small for a 3A data set?  A range of 70 to 75 is a
little smaller than usual but probably not out of bounds.

Dale Tronrud

On 10/12/2016 7:59 PM, sunanda williams wrote:
> Hi all,
> I have a structure at 3.0 A and R/Rfree of 24/28. The mean B value is
> around 98. The B value is especially high at the N terminus and two loop
> regions (around 120-150 AA).
> The rest of the structure averaged around 70-75. Has anyone faced such a
> scenario? How reliable is the structure and is there any way to better
> the B factors.
> Any help is appreciated.
> Thank you!!

Re: [ccp4bb] paired refinement

2015-07-07 Thread Dale Tronrud

   Comparing stats on geometry as well as R values is confounded by the
problem of choosing weights.  Should you hold the weights fixed or
perform an individualized weight optimization at each step?

   More importantly, our geometry libraries are imperfect.  We know from
surveys that the fit of models to Engh  Huber gets worst as the
resolution of their X-ray data gets higher.  This is because there are
real (and to a good extent conformationally dependent) variations in
bond angles that are brought to light by the high resolution (better
than about 1.6A) data.

   If I add quality high resolution data I would expect the geometry
stats to get worst.

Dale Tronrud

On 7/7/2015 11:17 AM, Shane Caldwell wrote:
 Chiming in late with a follow-up question:
 
On the other hand in paired refinement, if adding the data improves the
 structure
as measured by Rfree in a zone excluding the added data, then it is
 hard to deny
that that data are worth including.
 
 Is it correct to think that model geometry would also be valuable at
 this point? If adding reflections lead to a model with more reasonable
 average bond angles, reduced clashes, etc., that would indicate that the
 added reflections have improved the refinement, no? The geometry stats
 should also be completely independent from the crystallographic stats.
 
 Shane Caldwell
 McGill University
 
 
 
 On Fri, Jul 3, 2015 at 2:56 AM, Kay Diederichs
 kay.diederi...@uni-konstanz.de mailto:kay.diederi...@uni-konstanz.de
 wrote:
 
 On Thu, 2 Jul 2015 13:25:07 -0400, Edward A. Berry
 ber...@upstate.edu mailto:ber...@upstate.edu wrote:
 
 My take on this-
 No one has been willing to specify a cutoff (and probably there is no 
 rigorous way to
 mathematically define the cutoff) and say If CC* (or CCfree or 
 whatever) is below X
 then it will not improve your structure, if above X then it will.
 
 the electron microscopy community uses a similar measure (FSC,
 Fourier Shell Correlation) as CC1/2. They follow their Gold
 Standard if they cut their data at FSC=0.143 . Mathematically,
 CC1/2=0.143 is equivalent to CC*=0.5 . So researchers of that
 community _do_ specify a cutoff, and by calling it Gold Standard
 they cast it in stone. Very helpful because probably no reviewer
 ever calls a Gold Standard into question.
 
  Probably depends
 among other things on how strong the lower resolution data is, how good 
 the
 structure is without the added data.
 
 the latter point is crucial: a structure that is good can make use
 of higher-resolution data than a structure that is not properly
 refined. That is different from the situation in electron
 microscopy, where the phases are obtained experimentally. This is
 why an X-ray structure at an early stage of iterative
 refinement/fitting may possibly not be improved by the weak
 high-resolution data , and why the paired refinement should be done
 during the end game of refinement. But as long as CC1/2 is
 statistically significant it may improve a very good model even if
 CC*0.5 (see Bublitz et al IUCrJ 2, 409-420 (2015) for an example).
 
 To find out how close the structure to the data is, it helps to
 compare CCwork and CC*.
 
 An arbitrary cutoff (like CC*=0.5) is thus not sensible in all
 situations; it may serve as a rule of thumb though since the
 difference in resolution limit is not large anyway: CC*=0.5 and
 CC*=0.3 usually differ by (say) 0.1A only.
 
 On the other hand in paired refinement, if adding the data improves the 
 structure
 as measured by Rfree in a zone excluding the added data, then it is hard 
 to deny
 that that data are worth including.
 
 Absolutely. Much better than to believe in rules of thumb.
 
 best,
 
 Kay
 
 
 eab
 
 On 07/02/2015 12:52 PM, Keller, Jacob wrote:
  Wasn’t all of this put to bed through the implementation of CC
 measures?
 
  JPK
 
  *From:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK
 mailto:CCP4BB@JISCMAIL.AC.UK] *On Behalf Of *Robbie Joosten
  *Sent:* Thursday, July 02, 2015 12:46 PM
  *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
  *Subject:* Re: [ccp4bb] paired refinement
 
  But it is not the R-free of the shell here. In paired refinement
 you take the R-free of the reflections outside the shell.
 
  Cheers,
  Robbie
 
  Sent with my Windows Phone

Re: [ccp4bb] paired refinement

2015-07-03 Thread Dale Tronrud

Dear Kay,

   You are right.  I had forgotten the normalization part of the CC
calculation.

Dale

On 7/3/2015 12:01 AM, Kay Diederichs wrote:
 Hi Dale,
 
 On Thu, 2 Jul 2015 10:45:45 -0700, Dale Tronrud de...@daletronrud.com wrote:
 
   While I was puzzling over an entry in the PDB some years ago (since
 obsoleted) I noticed that all the high resolution amplitudes were equal
 to 11.0!  This was before CC1/2 but for this structure it would have
 been equal to one, and yet the outer data were useless.  
 
 no, the correlation coefficient between data that are exactly the same (which 
 you imply) is (mathematically) undefined. Most implementations would either 
 crash with a divide-by-zero error, or (if they catch the problem) give a 
 correlation of zero. The latter is sensible because the correlation is 
 invariant to adding an offset to one (or both) of the variables that are 
 compared.
 
 (You know all of this, and) I just want to point out that this is not a valid 
 example where CC1/2 would be 1.
 
 best,
 
 Kay
 
 A practical
 test like paired refinement can't be fooled in this way.

 Dale Tronrud

 On 7/2/2015 10:25 AM, Edward A. Berry wrote:
 My take on this-
 No one has been willing to specify a cutoff (and probably there is no
 rigorous way to
 mathematically define the cutoff) and say If CC* (or CCfree or
 whatever) is below X
 then it will not improve your structure, if above X then it will.
 Probably depends
 among other things on how strong the lower resolution data is, how good the
 structure is without the added data.
 On the other hand in paired refinement, if adding the data improves the
 structure
 as measured by Rfree in a zone excluding the added data, then it is hard
 to deny
 that that data are worth including.

 eab

 On 07/02/2015 12:52 PM, Keller, Jacob wrote:
 Wasn’t all of this put to bed through the implementation of CC measures?

 JPK

 *From:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] *On Behalf
 Of *Robbie Joosten
 *Sent:* Thursday, July 02, 2015 12:46 PM
 *To:* CCP4BB@JISCMAIL.AC.UK
 *Subject:* Re: [ccp4bb] paired refinement

 But it is not the R-free of the shell here. In paired refinement you
 take the R-free of the reflections outside the shell.

 Cheers,
 Robbie

 Sent with my Windows Phone

 -
 --
 ---


 *Van: *Edward A. Berry mailto:ber...@upstate.edu
 *Verzonden: *‎2-‎7-‎2015 18:43
 *Aan: *CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
 *Onderwerp: *Re: [ccp4bb] paired refinement

 Another criterion for cutoff, also requiring the structure to be solved,
 is the agreement between data and structure, e.g. Rfree or CCfree.
 I think it is very unlikely that you could get Rfree =.2493 in a shell
 which contains only noise. So I would suggest doing paired refinement
 to 2.2 and 2.1 A (if the data is available).

 On 07/01/2015 07:15 PM, Eric Karg wrote:
   Hi all,
  
   I have a dataset processed in XDS to 2.3 A (based on CC1/2). I'm
 trying to do paired refinement to determine the optimal resolution
 cutoff. Here is what I get at different resolutions set in Phenix:
  
   Final Rfree/Rwork:
   2.7— 0.2498/0.2027
   2.6— 0.2519/0.2009
   2.5— 0.2567/0.2025
   2.4 — 0.2481/0.2042
   2.3 — 0.2493/0.2075
  
   The geometry of all output structures are similar.
  
   1. What is the high resolution cutoff based on these data? I know
 that Rfree/Rwork at different resolution should not be compared, but
 is there a simple way to do the test as described in the KD 2012
 Science paper using Phenix GUI?
  
   2. For refining a structure at a lower resolution (lower than the
 initial dataset), do I simply set the resolution limit in the
 refinement or I need to reprocess the data starting from the images?
 Do I need to do anything with Rfree flags? Based on the discussions on
 this forum I know I should deposit the highest resolution dataset but
 my question is about the mtz file which will be used for refinement.
  
   Thank you very much for your help!

Re: [ccp4bb] paired refinement

2015-07-02 Thread Dale Tronrud

   While I was puzzling over an entry in the PDB some years ago (since
obsoleted) I noticed that all the high resolution amplitudes were equal
to 11.0!  This was before CC1/2 but for this structure it would have
been equal to one, and yet the outer data were useless.  A practical
test like paired refinement can't be fooled in this way.

Dale Tronrud

On 7/2/2015 10:25 AM, Edward A. Berry wrote:
 My take on this-
 No one has been willing to specify a cutoff (and probably there is no
 rigorous way to
 mathematically define the cutoff) and say If CC* (or CCfree or
 whatever) is below X
 then it will not improve your structure, if above X then it will.
 Probably depends
 among other things on how strong the lower resolution data is, how good the
 structure is without the added data.
 On the other hand in paired refinement, if adding the data improves the
 structure
 as measured by Rfree in a zone excluding the added data, then it is hard
 to deny
 that that data are worth including.
 
 eab
 
 On 07/02/2015 12:52 PM, Keller, Jacob wrote:
 Wasn’t all of this put to bed through the implementation of CC measures?

 JPK

 *From:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] *On Behalf
 Of *Robbie Joosten
 *Sent:* Thursday, July 02, 2015 12:46 PM
 *To:* CCP4BB@JISCMAIL.AC.UK
 *Subject:* Re: [ccp4bb] paired refinement

 But it is not the R-free of the shell here. In paired refinement you
 take the R-free of the reflections outside the shell.

 Cheers,
 Robbie

 Sent with my Windows Phone

 ---
 ---


 *Van: *Edward A. Berry mailto:ber...@upstate.edu
 *Verzonden: *‎2-‎7-‎2015 18:43
 *Aan: *CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
 *Onderwerp: *Re: [ccp4bb] paired refinement

 Another criterion for cutoff, also requiring the structure to be solved,
 is the agreement between data and structure, e.g. Rfree or CCfree.
 I think it is very unlikely that you could get Rfree =.2493 in a shell
 which contains only noise. So I would suggest doing paired refinement
 to 2.2 and 2.1 A (if the data is available).

 On 07/01/2015 07:15 PM, Eric Karg wrote:
   Hi all,
  
   I have a dataset processed in XDS to 2.3 A (based on CC1/2). I'm
 trying to do paired refinement to determine the optimal resolution
 cutoff. Here is what I get at different resolutions set in Phenix:
  
   Final Rfree/Rwork:
   2.7— 0.2498/0.2027
   2.6— 0.2519/0.2009
   2.5— 0.2567/0.2025
   2.4 — 0.2481/0.2042
   2.3 — 0.2493/0.2075
  
   The geometry of all output structures are similar.
  
   1. What is the high resolution cutoff based on these data? I know
 that Rfree/Rwork at different resolution should not be compared, but
 is there a simple way to do the test as described in the KD 2012
 Science paper using Phenix GUI?
  
   2. For refining a structure at a lower resolution (lower than the
 initial dataset), do I simply set the resolution limit in the
 refinement or I need to reprocess the data starting from the images?
 Do I need to do anything with Rfree flags? Based on the discussions on
 this forum I know I should deposit the highest resolution dataset but
 my question is about the mtz file which will be used for refinement.
  
   Thank you very much for your help!

Re: [ccp4bb] Coot and Pymol through SSH by Xming/PUTTY on a windows client?

2015-07-01 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   Both the ssh client and server must be set up with X11Forwarding
yes.  The message sounds like your local computer is not set up to
accept X11 tunneling.  (By the way, in the X11 world the remote system
is the client and your local system the server.)

Dale Tronrud

On 7/1/2015 3:40 PM, Chen Zhao wrote:
 Hi all,
 
 Sorry to bother you, but I am trying to fix a long-standing problem
 that I cannot run Coot and Pymol through Xming/PUTTY by SSH
 connection on a windows client. The error messages are pretty
 similar for both: Coot: PuTTY X11 proxy: unable to connect to
 forwarded X server: Network error: Connection refused 
 (coot-real:23113): Gtk-WARNING **: cannot open display:
 localhost:10.0 Pymol: PuTTY X11 proxy: unable to connect to
 forwarded X server: Network error: Connection refused freeglut
 (pymol): failed to open display 'localhost:10.0'
 
 Does anybody have some ideas?
 
 Thank you so much, Chen
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlWUbgYACgkQU5C0gGfAG10hVgCeLmuE4pHFrapu9biY9nHO/Bpi
5O8An17UN+hgpr7/6A+mny+XOBfJV/T5
=iTfz
-END PGP SIGNATURE-

Re: [ccp4bb] distorted phosphate molecule geometry after refinement

2015-06-22 Thread Dale Tronrud

   It is possible that your PO4 has its atoms labeled with the wrong
chirality.  Yes, I know that PO4 is not chiral when you ignore hydrogen
atoms and single/double bonds but adding labels creates an unnatural
chriality.  Try your refinement again after switching the labels on two
oxygen atoms.

Dale Tronrud

On 6/22/2015 7:48 AM, ansuman biswas wrote:
 Dear CCP4 users,
 
 I am working on a protein from a hyperthermophilic archaeon.
 
 I have collected mutliple X-Ray datasets, both from home source and
 synchrotron and always found a clear density for tetrahedral geometry,
 co-ordinated by two histidines and one lysine. 
 
 I tried fitting phosphate there, but its geometry always gets distorted
 after each refinement cycle (Refmac 5.8.0073). Also I found some short
 contacts between the coordinated residues and phosphate which were very
 difficult to remove.
 
 I am attaching a figure with the density and phosphate. 
 
 Kindly suggest -
 1. if this may be a possible modification of any of the associated
 residues, and the code of the modified residue to be used.
 
 2. If the ligand requires separate restraints during refinement, I am
 using the restrained refinement option available at the top of the GUI
 for refmac.
 
 Thanking you,
 yours sincerely,
 Ansuman Biswas,
 PhD student,
 Dept. of Physics,
 IISc

Re: [ccp4bb] Residual density feature

2015-06-16 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 6/14/2015 2:49 AM, Robbie Joosten wrote:
 Hi Colin,
 
 You can define an UNL (unknown ligand) in the blob. This is the
 standard name for such a compound. It becomes a bit messy in
 refinement in terms of restraints, but it does exactly what you
 want it to do: say you noticed the blob but couldn't figure out
 what it was.
 
   There is danger along this route - but I suppose this is a
philosophical matter.  Your PDB file is your model for what is in the
crystal.  If you don't know what a compound in your crystal is, can
you say that you have modeled it?  What are your intentions when you
place this unknown residue in your model -- Are you just placing a
marker to tell those who follow that you noticed something here or are
you saying this is how electrons are arranged here.

   Recently I was looking at 1I1W for some reason.  It is a very nice
model at 0.89 A resolution.  It contains, however, several single atom
residues of type UNX and the atoms are of type X.  The problem I have
is that these atoms have their occupancy set to 1.0 and have varying B
factors.  What am I supposed to make of a fully occupied atom of
unknown type?

   Apparently I'm not alone in my confusion.  The Electron Density
Server does not supply a map for this entry.  I can see nothing wrong
with the data nor the rest of the model so I presume the service is
failing because it doesn't know how to calculate the scattering of an
atom of unknown type.

   So, my question is - If you place marker atoms to indicate the
binding of any unknown ligand, should the difference density disappear
from your map?  Certainly if you are interested in bootstrapping phase
information you would want this, but is this appropriate for deposition?
Should you be able to say your difference map is flat and your R value
low when all you did was shrug your shoulders and say I don't know?

   There is a, sadly underused, feature of the PDB format called the
SITE record.  With it you can describe the residues of your protein
that form a binding site and give a text description in a REMARK 800
statement.  This option would allow you to acknowledge that something
is binding at this location but leave the difference peak for others
to view and puzzle over on their own.

Dale Tronrud

 Cheers, Robbie
 
 -Original Message- From: CCP4 bulletin board
 [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Dom Bellini Sent:
 Sunday, June 14, 2015 11:36 To: CCP4BB@JISCMAIL.AC.UK Subject:
 Re: [ccp4bb] Residual density feature
 
 Dear Colin,
 
 
 
 I believe people usually refer to it as unidentified blob when
 depositing/writing in
 these
 cases. But I wonder whether others may suggest better options.
 
 
 
 Best,
 
 
 
 D
 
  From: CCP4 bulletin board
 [CCP4BB@JISCMAIL.AC.UK] on behalf of Colin Levy 
 [c.l...@manchester.ac.uk] Sent: 14 June 2015 09:53 To: ccp4bb 
 Subject: [ccp4bb] Residual density feature
 
 Dear all,
 
 I am currently working on a structure that contains a residual
 density feature located
 within
 the active site. Due to a combination of factors including
 limited occupancy, modest resolution, twinning etc it has not
 been possible to unambiguously identify this feature
 despite
 fairly extensive efforts.
 
 What is the best way of dealing with such a feature when
 depositing the structure?
 Ideally I
 would like to draw attention to the presence of residual density
 whilst not implying
 that I have
 been able to identify it.
 
 Many thanks,
 
 Colin
 
 
 Manchester Protein Structure Facility
 
 Dr. Colin W. Levy MIB G034 Tel.  0161 275 5090 Mob.07786 197 554 
 c.l...@manchester.ac.ukmailto:c.l...@manchester.ac.uk
 
 
 -- This e-mail and any attachments may contain confidential,
 copyright and or privileged material, and are for the use of the
 intended addressee only. If you are not the
 intended
 addressee or an authorised recipient of the addressee please
 notify us of receipt by
 returning
 the e-mail and do not use, copy, retain, distribute or disclose
 the information in or
 attached to
 the e-mail. Any opinions expressed within this e-mail are those
 of the individual and not
 necessarily of
 Diamond Light Source Ltd. Diamond Light Source Ltd. cannot
 guarantee that this e-mail or any attachments are free
 from
 viruses and we cannot accept liability for any damage which you
 may sustain as a result
 of
 software viruses which may be transmitted in or with the
 message. Diamond Light Source Limited (company no. 4375679).
 Registered in England and Wales with its registered office at
 Diamond House, Harwell Science and Innovation Campus, Didcot, 
 Oxfordshire, OX11 0DE, United Kingdom
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlWAW3AACgkQU5C0gGfAG12vbQCgubFnSV97Xu/GLtOlW9T7il56
rYQAniTJS2Rt6tLUeNCX0xSjFBaxIW56
=gPOx
-END PGP SIGNATURE-

Re: [ccp4bb] Alternative ways to get electron density map other than EDS server

2015-06-09 Thread Dale Tronrud

   In addition to the other excellent suggestions you have received, you
can download the map for a re-refined version of a PDB entry at
PDB-Redo.  The latest Coot has a button for that.

   It appears that the EDS is down.  I'll notify the authorities.

Dale Tronrud

On 6/9/2015 1:11 PM, Xiao Lei wrote:
 Hi All,
 
 I am trying to get electron density map of some pdb structures, I know
 there is a database called Electron density server (EDS
 http://eds.bmc.uu.se/eds). But somehow these days I can not connect to
 the website and I keep getting the This webpage is not available
 message in my browser (Internet connection is ok).  I also tried to go
 to PDB databank, search a structure and click EDS link, but this did
 not connect to the server neither.
 
 Are there other ways to get electron density maps?
 
 Thanks.
 
 
 Xiao

Re: [ccp4bb] PyMOL v. Coot map 'level'

2015-06-01 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   You are correct about this.  The magnitude of this effect depends
on the amount of solvent in your crystal (which tends to be flatter)
but it rarely reaches a factor of two.

   This does point out a serious flaw with contouring a map relative
to its rms (one of many).  If you calculate a map that just covers the
interesting part, it probably contains a lot of features.  The rms
calculated from that region of the map alone will be larger and your
features will look smaller.  If you set a contour level relative to
e/A^3 the appearance of the image will not depend on the region of
space covered by your map nor the percentage of solvent in the crystal.

Dale Tronrud

On 6/1/2015 9:16 AM, Thomas Holder wrote:
 Hi Emilia et al.,
 
 I tried to figure out the PyMOL vs. Coot normalization discrepancy
 a while ago. As far as I remember, PyMOL normalizes on the raw data
 array, while Coot normalizes across the unit cell. So if the data
 doesn't exactly cover the cell, the results might be different.
 
 Cheers, Thomas
 
 On 01 Jun 2015, at 11:37, Emilia C. Arturo (Emily)
 ec...@drexel.edu wrote:
 One cannot understand what is going on without knowing how this
 map was calculated.  Maps calculated by the Electron Density
 Server have density in units of electron/A^3 if I recall, or at
 least its best effort to do so.
 
 This is what I was looking for! (i.e. what the units are) Thanks.
 :-) Yes, I'd downloaded the 2mFo-DFc map from the EDS, and got
 the same Coot v. PyMOL discrepancy whether or not I turned off
 the PyMOL map normalization feature.
 
 If you load the same map into Pymol and ask it to normalize the 
 density values you should set your contour level to Coot's rmsd
 level. If you don't normalize you should use Coot's e/A^3 level.
 It is quite possible that they could differ by a factor of two.
 
 This was exactly the case. The map e/A^3 level (not the rmsd
 level) in Coot matched very well, visually, the map 'level' in
 PyMOL; they were roughly off by a factor of 2.
 
 I did end up also generating a 2mFo-DFc map using phenix, which
 fetched the structure factors of the model in which I was
 interested. The result was the same (i.e. PyMOL 'level' = Coot
 e/A^3 level ~ = 1/2 Coot's rmsd level) whether I used the CCP4
 map downloaded from the EDS, or generated from the structure
 factors with phenix.
 
 Thanks All.
 
 Emily.
 
 
 
 Dale Tronrud
 
 On 5/29/2015 1:15 PM, Emilia C. Arturo (Emily) wrote:
 Hello. I am struggling with an old question--old because I've
 found several discussions and wiki bits on this topic, e.g. on
 the PyMOL mailing list 
 (http://sourceforge.net/p/pymol/mailman/message/26496806/ and 
 http://www.pymolwiki.org/index.php/Display_CCP4_Maps), but the 
 suggestions about how to fix the problem are not working for
 me, and I cannot figure out why. Perhaps someone here can
 help:
 
 I'd like to display (for beauty's sake) a selection of a model
 with the map about this selection. I've fetched the model from
 the PDB, downloaded its 2mFo-DFc CCP4 map, loaded both the map
 and model into both PyMOL (student version) and Coot (0.8.2-pre
 EL (revision 5592)), and decided that I would use PyMOL to make
 the figure. I notice, though, that the map 'level' in PyMOL is
 not equivalent to the rmsd level in Coot, even when I set
 normalization off in PyMOL. I expected that a 1.0 rmsd level in
 Coot would look identical to a 1.0 level in PyMOL, but it does
 not; rather, a 1.0 rmsd level in Coot looks more like a 0.5
 level in PyMOL. Does anyone have insight they could share about
 the difference between how Coot and PyMOL loads maps? Maybe the
 PyMOL 'level' is not a rmsd? is there some other normalization
 factor in PyMOL that I should set? Or, perhaps there is a
 mailing list post out there that I've missed, to which you
 could point me. :-)
 
 Alternatively, does anyone have instructions on how to use Coot
 to do what I'm trying to do in PyMOL? In PyMOL I displayed the
 mesh of the 2Fo-Fc map, contoured at 1.0 about a
 3-residue-long 'selection' like so: isomesh map, My_2Fo-Fc.map,
 1.0, selection, carve=2.0, and after hiding everything but the
 selection, I have a nice picture ... but with a map at a level
 I cannot interpret in PyMOL relative to Coot :-/
 
 Regards, Emily.
 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (MingW32)
 
 iEYEARECAAYFAlVo1L4ACgkQU5C0gGfAG10YkwCfROYPVXBK/pDS4z/zi5MNY1D+ 
 nHIAnjOFiAkb6JbuIGWRWkBFDG5Xgc2K =hrPT -END PGP
 SIGNATURE-
 
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlVskp8ACgkQU5C0gGfAG12urQCfQs1pkluJuYXVtULqNrBcBLE8
vykAoJ8fOqU2BnmGmNj+qspFX4/7Jo4T
=LMS8
-END PGP SIGNATURE-

Re: [ccp4bb] PyMOL v. Coot map 'level'

2015-05-29 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   One cannot understand what is going on without knowing how this map
was calculated.  Maps calculated by the Electron Density Server have
density in units of electron/A^3 if I recall, or at least its best
effort to do so.  There are many problems with getting this exact and
I don't think the server does anything fancy to overcome them.  When
Coot reads the file it calculates the rms deviation from 0.0 and
reports the contour level both relative to that rmsd and in raw units
(whatever they were, Coot could care less).

   Some programs normalize the maps they calculate so the rmsd will
be equal to 1.0.  When Coot reads such a map its two contour level
indicators will be the same.

   If you load the same map into Pymol and ask it to normalize the
density values you should set your contour level to Coot's rmsd level.
 If you don't normalize you should use Coot's e/A^3 level.  It is
quite possible that they could differ by a factor of two.

Dale Tronrud

On 5/29/2015 1:15 PM, Emilia C. Arturo (Emily) wrote:
 Hello. I am struggling with an old question--old because I've found
 several discussions and wiki bits on this topic, e.g. on the PyMOL
 mailing list 
 (http://sourceforge.net/p/pymol/mailman/message/26496806/ and
 http://www.pymolwiki.org/index.php/Display_CCP4_Maps), but the
 suggestions about how to fix the problem are not working for me, 
 and I cannot figure out why. Perhaps someone here can help:
 
 I'd like to display (for beauty's sake) a selection of a model with
 the map about this selection. I've fetched the model from the PDB, 
 downloaded its 2mFo-DFc CCP4 map, loaded both the map and model
 into both PyMOL (student version) and Coot (0.8.2-pre EL (revision
 5592)), and decided that I would use PyMOL to make the figure. I
 notice, though, that the map 'level' in PyMOL is not equivalent to
 the rmsd level in Coot, even when I set normalization off in PyMOL.
 I expected that a 1.0 rmsd level in Coot would look identical to a
 1.0 level in PyMOL, but it does not; rather, a 1.0 rmsd level in
 Coot looks more like a 0.5 level in PyMOL. Does anyone have insight
 they could share about the difference between how Coot and PyMOL
 loads maps? Maybe the PyMOL 'level' is not a rmsd? is there some
 other normalization factor in PyMOL that I should set? Or, perhaps
 there is a mailing list post out there that I've missed, to which
 you could point me. :-)
 
 Alternatively, does anyone have instructions on how to use Coot to
 do what I'm trying to do in PyMOL? In PyMOL I displayed the mesh of
 the 2Fo-Fc map, contoured at 1.0 about a 3-residue-long
 'selection' like so: isomesh map, My_2Fo-Fc.map, 1.0, selection,
 carve=2.0, and after hiding everything but the selection, I have a
 nice picture ... but with a map at a level I cannot interpret in
 PyMOL relative to Coot :-/
 
 Regards, Emily.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlVo1L4ACgkQU5C0gGfAG10YkwCfROYPVXBK/pDS4z/zi5MNY1D+
nHIAnjOFiAkb6JbuIGWRWkBFDG5Xgc2K
=hrPT
-END PGP SIGNATURE-

Re: [ccp4bb] 3BDN, 16.5% Ramachandran Outliers!!!!!

2015-04-27 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   This particular model was deposited in early December of 2014, so
the authors had the validation report in hand before publication and,
I presume could (should) have passed it on to the reviewers.  The
release date of the entry has nothing to do with the availability of
the validation report to reviewers.

   The model's validation report does have low percentiles on the
clashscore and RSRZ outliers but is far better than most 3.5A models
for Ramachandran outliers.  I'm not sure what your objection to this
model is.  Did you look at the fit to density and have some specific
criticism of the quality of workmanship?

   As for the PDB releasing a model containing outliers: The PDB is an
archive, not a gatekeeper.  If you, and your publisher, want to put
your name on it the PDB will store it so that others can look at the
model/data and judge for themselves.

Dale Tronrud

On 4/27/2015 7:04 AM, Oganesyan, Vaheh wrote:
 Hi Robbie and Co,
 
 
 
 These things are happening now too. Look at the entry 4x4m. The
 paper got published in January, PDB released coordinates in April.
 That means reviewers did not have a chance to look even at
 validation report. In my opinion, whatever it is worth, every
 journal dealing with crystal structures should, at the very least,
 request the validation report from PDB including Nature, Science
 and PNAS.
 
 What is also interesting that at the end PDB released the
 coordinates with large number of outliers. I don’t think those can
 be justified with low resolution of the data.
 
 
 
 /Regards,/
 
 / /
 
 /Vaheh Oganesyan/
 
 /www.medimmune.com/
 
 
 
 *From:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] *On
 Behalf Of *Robbie Joosten *Sent:* Friday, April 24, 2015 2:37 AM 
 *To:* CCP4BB@JISCMAIL.AC.UK *Subject:* Re: [ccp4bb] 3BDN, 16.5%
 Ramachandran Outliers!
 
 
 
 The PDB_REDO entry for 3bdn was pretty old, so I replaced it using
 a newer version of PDB_REDO that can use nucleic acid restraints
 from LibG: http://www.cmbi.ru.nl/pdb_redo/bd/3bdn/index.html.
 Obviously, the new structure model is far from brilliant (PDB_REDO
 doesn't rebuild at this resolution), but Molprobity seems to like
 it quite a bit more.
 
 I agree with the replies so far in that: - The topic starter was
 rather blunt and could have been more subtle. He should probably go
 work in the Netherlands ;) - Building structure models at 3.9A is
 incredibly difficult. - The tools we have now are much better than
 in 2008.
 
 However, we should not act like 2008 were still in the dark ages
 of crystallography. There are a lot of good structures available
 from that time (and also from long before) even at that resolution.
 That is not surprising seeing that we also already had very good
 building and refinement tools available. We also had enough
 validation tools available to tell us that this particular
 structure model isn't very good. I really believe that a good
 crystallographer that was not pressed for time (or at least didn't
 rush) could have done better with the data and the tools
 available.
 
 I'm now going to hide behind an asbestos wall to say this: The
 manuscript was submitted in July 29th 2007, the PDB entry was 
 deposited November 15th 2007. That means that the referees probably
 did not have a chance to see the finished structure model, at least
 not in the first pass. This implies that the authors didn't want to
 deposit the model on time. There are a whole lot of excuses for
 this, that are fortunately dealt with now 
 (http://onlinelibrary.wiley.com/doi/10.1107/S0907444913029168/abstract),

 
but the referees could have been a bit more critical. They should have
 at least seen that the supplemental table 1 did not show any 
 Ramachandran statistics. We can only speculate what happened. I'm 
 guessing that the authors didn't finish the structure yet and
 rushed the publication through to avoid being scooped or for the
 general glory of a Nature paper. To bad that came at the expense of
 the crystallography.
 
 Cheers, Robbie
 
 

  Date: Thu, 23 Apr 2015 18:43:13 + From:
 kell...@janelia.hhmi.org mailto:kell...@janelia.hhmi.org Subject:
 Re: [ccp4bb] 3BDN, 16.5% Ramachandran Outliers! To:
 CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK
 
 Is it in pdb redo? Take a look here: 
 http://www.cmbi.ru.nl/pdb_redo/bd/3bdn/index.html
 
 
 
 JPK
 
 
 
 *From:*CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] *On
 Behalf Of *Misbah ud Din Ahmad *Sent:* Thursday, April 23, 2015
 2:28 PM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK 
 *Subject:* Re: [ccp4bb] 3BDN, 16.5% Ramachandran Outliers!
 
 
 
 Dear Phoebe A. Rice,
 
 I didn't mean to discredit the work but the statistics of the
 structure just shocked me at the first instance. I could for
 example point out to another structure 1ZR2, which has the same
 resolution (the protein Molecular

Re: [ccp4bb] Phaser going into infinite loop in Ample

2015-04-22 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

   Thanks for all the help!

   We will restart the job with the KILL option as Jens suggested.

   We will also send a copy of the Phaser log file to Randy.  This is
not a case of Phaser simply trying harder - it is doing the same
search over and over again.  After four days the log file is getting
pretty long.  I may have to use DropBox.

   We considered trying ARCIMBOLDO but the resolution of the data did
not reach the 2.1 A limit usually suggested for that program.  If we
get AMPLE running, and it fails, we will give ARCIMBOLDO a try.

Dale Tronrud
Sarah Clark

On 4/22/2015 2:06 AM, Thomas, Jens wrote:
 Dear Dale,
 
 This is a known issue with AMPLE and will be fixed with the next
 release.
 
 In the meantime you can tell AMPLE to pass the KILL option that
 Randy mentions to PHASER, by adding the following arguments to your
 script:
 
 -mr_keys PKEY KILL TIME 360
 
 this will kill PHASER after 360 minutes (6 hours), which we've
 found if normally enough, although pick whatever time works for
 you,
 
 I should also point out that I think George is doing a disservice
 to SHELXE. In our last paper looking at coiled-coils, we saw
 successes at resolutions much lower than 2.1, in one case, AMPLE
 was able to solve a structure with a resolution of 2.9A:
 
 http://dx.doi.org/10.1107/S2052252515002080
 
 If you have any issues getting the KILL command to work, please
 feel free to contact me off-list.
 
 Best wishes,
 
 Jens
 
  From: CCP4 bulletin board
 [CCP4BB@JISCMAIL.AC.UK] on behalf of Randy Read [rj...@cam.ac.uk] 
 Sent: 22 April 2015 09:04 To: CCP4BB@JISCMAIL.AC.UK Subject: Re:
 [ccp4bb] Phaser going into infinite loop in Ample
 
 Hi Dale,
 
 It must actually be AMPLE deciding how many copies to search for.
 Phaser will give you some information about how consistent the
 specified composition is with the Matthews volume, but it just
 searches for the number of copies that it's instructed to look for.
 We haven't put the intelligence into it to correlate the model(s)
 with the composition information and try out different
 possibilities.  At the moment, we're leaving that level of analysis
 to pipelines like MRage.
 
 We're well aware of the tension between looking hard enough to find
 a solution in a difficult case and not throwing good CPU cycles
 after bad when it's hopeless.  We're gradually introducing new
 features to make these decisions better, but we do tend to prefer
 wasting CPU time to missing solutions.  However, we've introduced a
 couple of ways to limit the amount of time that Phaser spends
 pursuing very difficult or hopeless solutions, partly for the
 benefit of people such as the developers of Arcimboldo, AMPLE and
 the wide-search molecular replacement pipeline.  One is the KILL
 command, which is a rather blunt instrument saying to give up if a
 solution isn't found in a certain length of time.  In AMPLE, if you
 set quick mode, then the KILL time is set to 15 minutes.  Another
 option (which I don't think AMPLE uses) is the PURGE command, where
 you can say (for instance) that Phaser should pursue a maximum of
 25 partial solutions when adding the next component.
 
 If you're seeing an infinite loop, it would be handy if you could
 send me a copy of the logfile showing what is going on.  There have
 been some bugs leading to such infinite loops under some
 circumstances, and if you're running into one of those there's a
 good chance that it has been fixed in a recent nightly build of
 Phaser available through Phenix.  You can instruct CCP4 to use the
 Phaser executable from Phenix, and I think this should work fine in
 AMPLE, though I haven't tested it — I don't think any relevant
 syntax has changed.  It's been a while since we've had a new stable
 release of Phaser in either CCP4 or Phenix, so we're aiming to get
 one out in the not too distant future.
 
 All the best,
 
 Randy
 
 On 22 Apr 2015, at 05:56, Dale Tronrud de...@daletronrud.com
 wrote:
 
 
 We are having a problem with AMPLE and hope someone can help.
 
 The protein is about 70 amino acids long and we suspect it forms a 
 coiled-coil.  Our previous attempts at molecular replacement have 
 failed so we hoped that AMPLE, with its ability to generate a
 variety of potential models, would do the trick.
 
 Our problem is that all of our CPU cores are consumed by Phaser 
 jobs that are not making progress.  With this protein Phaser
 decides that it will look for 11 copies in the asymmetric unit.
 For a few of the possible ensembles it fails to find even one copy
 and gives up. That's fine with us.  For other ensembles it finds a
 handful of possible first positions, goes on to look for a second
 and fails, then goes back to try to place a second copy again.  We
 presume that the intent is to lower the acceptance criteria in the
 second pass, but in actuality Phaser simply repeats the same search
 that failed before and fails again

[ccp4bb] Phaser going into infinite loop in Ample

2015-04-21 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   We are having a problem with AMPLE and hope someone can help.

   The protein is about 70 amino acids long and we suspect it forms a
coiled-coil.  Our previous attempts at molecular replacement have
failed so we hoped that AMPLE, with its ability to generate a variety
of potential models, would do the trick.

   Our problem is that all of our CPU cores are consumed by Phaser
jobs that are not making progress.  With this protein Phaser decides
that it will look for 11 copies in the asymmetric unit.  For a few of
the possible ensembles it fails to find even one copy and gives up.
That's fine with us.  For other ensembles it finds a handful of
possible first positions, goes on to look for a second and fails, then
goes back to try to place a second copy again.  We presume that the
intent is to lower the acceptance criteria in the second pass, but in
actuality Phaser simply repeats the same search that failed before and
fails again.  The leads to an infinite loop.

   Once all the cores are occupied in this futile endeavor AMPLE makes
no further progress.

   How can we get Phaser to either try harder to place a molecule or
to give up?

   We are using CCP4 6.5.008 and the copy of Phaser that came with it.
 We used CCP4i to create a script which we modified slightly and ran
using the at command.  The command is:

/usr/local/ccp4-6.5/bin/ccp4-python -u /usr/local/ccp4-6.5/bin/ample
- -mtz /user/sarah/xray/1Apr_Athena/SD6004_2_001_mergedunique14.mtz
- -fasta /user/sarah/xray/1Apr_Athena/swaseq.fa -mr_sequence
/user/sarah/xray/1Apr_Athena/swaseq.fa -nmodels 500 -name MVD0
- -run_dir /home/sarah/AMPLE -nproc 6 -make_models True -rosetta_dir
/usr/local/rosetta-3.5 -frags_3mers
/user/sarah/xray/1Apr_Athena/aat000_03_05.200_v1_3 -frags_9mers
/user/sarah/xray/1Apr_Athena/aat000_09_05.200_v1_3 -make_frags False
- -F F -SIGF SIGF -FREE FreeR_flag  -early_terminate True   -use_shelxe
True -shelx_cycles 15 -use_arpwarp False

Any help is appreciated,
Dale Tronrud
Sarah Clark
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlU3KhEACgkQU5C0gGfAG117JACfXahEX8z1k3ev043a7V2SzhNh
p6UAnAmKkUZe46zbiXckdDTNgUQ0dgWq
=A/Sj
-END PGP SIGNATURE-

Re: [ccp4bb] Sortwater NCS Matrix input

2015-04-01 Thread Dale Tronrud

   I think you are on the right track - There are not enough decimal
points in your matrix elements to pass the orthonormal test.  This test
checks that the length of each row (x^2+y^2+z^2) is equal to one and the
dot product of each row with every other row is equal to zero.  If the
values on your NCS statement are in row order I calculate 0.999337 for
the length of the first row.  If the program is testing if this is equal
to one to four decimal points you lose.

   You have to add more digits, but just adding zeros isn't going to
accomplish much.  The best solution is to get your ncs program to report
its matrix with more digits -- three is pitiful.  Failing that you could
calculate one element of each row from the other two to ensure the
length is equal to one at a higher level of precision and hope this
doesn't mess up the dot product test.  You'll end up with one number in
each row having more than three decimal places.

Dale Tronrud

On 4/1/2015 2:52 PM, Shane Caldwell wrote:
 Hi ccp4bb,
 
 I'm trying to solve a problem I never quite figured out in the past. I'd
 like to use the *sortwater* utility to send my picked waters to various
 protein chains, and to give them the same residue number if they are
 NCS-equivalent, as the manual outlines.
 
 http://www.ccp4.ac.uk/html/sortwater.html
 
 The first part goes off perfectly, partitioning the waters into their
 respective chains. Where I run into problems is bringing in NCS. I can't
 get the program to recognize the transformation matrix. I can calculate
 the matrix using *superpose*, and manually input these (limited
 precision) values into my script, which looks like:
 
 NCS B C MATRIX 0.072 0.997 -0.012 0.991 -0.073 -0.113 -0.113 -0.004
 -0.994 37.502 -35.283 81.276
 
 and it returns
 
  WARNING:   Matrix is not orthonormal 
 
 
 My linear algebra is very limited, and I don't know exactly what this
 means in the context of this program, though I suspect it could be
 either linked to converting to fractional coordinates (I'm in a
 monoclinic system), or a product of the limited precision of the matrix
 values.
 
 Using the identity matrix, like so:
 
 NCS B C MATRIX 1.0 0.0 0.0 0.0 1.0 0.0 0.0
 0.0 1.0 0.000 0.000 0.000
 
 doesn't trigger the warning. These values have more digits, but adding
 extra zeroes to the original matrix as a very crude workaround still
 returns the error.
 
 So, I'm now stuck trying to parse what's going on. I know *sortwater*
 also takes O datablocks as matrix input, and that's something I could
 look into (especially if calculating in a different program might get me
 better precision). Although, I'm not sure the format is a factor given
 the identity matrix is accepted as a keyword input.
 
 Skimming the archives, I get the sense this isn't something that many
 users do any more. I have quite a few structures with hundreds of waters
 each and I'd like to get the waters organized, but doing it by hand will
 take a very long time. Any help getting this program running would be
 greatly appreciated!
 
 
 Shane Caldwell
 McGill University

Re: [ccp4bb] Peptide flips in electron density?

2015-02-12 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   I'm a little confused.  If you see from the density that a peptide
flip is required, can't you just fix it?  Or is it the case that the
density is telling you that both conformations are present?  In that
case build alternative conformations and refine the occupancy.

Dale Tronrud

On 2/12/2015 9:07 AM, Kimberly Stanek wrote:
 Hello all,
 
 I am in some of the final stages of refinement of a 1.5 A
 resolution oligomeric protein and I'm noticing what appears to be
 extra density corresponding to a carbonyl peptide flip near the
 N-terminus of one of the chains. I was wondering if anyone had any
 experience with this before and if so, could point me to any
 relevant literature or references.
 
 Thanks, Kim
 
 -- Kimberly Stanek Graduate Student Mura Lab Department of
 Chemistry University of Virginia (434) 924-7979
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlTdCpQACgkQU5C0gGfAG13+AACgmzI4srRRK51d/dTUTYCggNK5
EtUAoIkMS5lqN3+wrM3RVyPL2plQPjc9
=/wOR
-END PGP SIGNATURE-

Re: [ccp4bb] Free Reflections as Percent and not a Number

2014-11-21 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 11/21/2014 12:35 AM, F.Xavier Gomis-Rüth wrote:
 snip...
 
 As to the convenience of carrying over a test set to another
 dataset, Eleanor made a suggestion to circumvent this necessity
 some time ago: pass your coordinates through pdbset and add some
 noise before refinement:
 
 pdbset xyzin xx.pdb xyzout yy.pdb eof noise 0.4 eof
 

   I've heard this debiasing procedure proposed before, but I've
never seen a proper test showing that it works.  I'm concerned that
this will not erase the influence of low resolution reflections that
were in the old working set but are now in the new test set.  While
adding 0.4 A gaussian noise to a model would cause large changes to
the 2 A structure factors I doubt it would do much to those at 10 A.

   It seems to me that one would have to have random, but correlated,
shifts in atomic parameters to affect the low resolution data - waves
of displacements, sometimes to the left and other times to the right.
 You would need, of course, a superposition of such waves that span
all the scales of resolution in the data set.

   Has anyone looked at the pdbset jiggling results and shown that the
low resolution data are scrambled?

Dale Tronrud

 Xavier
 
 On 20/11/14 11:43 PM, Keller, Jacob wrote:
 Dear Crystallographers,
 
 I thought that for reliable values for Rfree, one needs only to
 satisfy counting statistics, and therefore using at most a couple
 thousand reflections should always be sufficient. Almost always,
 however, some seemingly-arbitrary percentage of reflections is
 used, say 5%. Is there any rationale for using a percentage
 rather than some absolute number like 1000?
 
 All the best,
 
 Jacob
 
 *** Jacob Pearson Keller,
 PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
 Ashburn, VA 20147 email: kell...@janelia.hhmi.org 
 *** .
 
 
 --
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlRviu4ACgkQU5C0gGfAG12TMwCfTT0Q4yfCCOxJlRXtsCXmmp1n
9lEAn2Ir57+Y16fh02VcsvDxwu6KYRGK
=68gK
-END PGP SIGNATURE-

Re: [ccp4bb] Free Reflections as Percent and not a Number

2014-11-20 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   I think is is just a matter of convenience when changing resolution
limits and trying to use the same test set.  If you always want 1000
but your next crystal doesn't diffract as well you have to select a
few more reflections and they will have been in the working set
previously.  If the next crystal diffracts better you have to convert
some of your old test set to working set so you can add some high res
reflections to the test set.  If you stick with percents life is
simpler.  The cost, of course is that you often have more than 1000
reflections in the test set which, while giving you a more reliable
free R, cause a larger degradation in the model itself.

Dale Tronrud

On 11/20/2014 2:43 PM, Keller, Jacob wrote:
 Dear Crystallographers,
 
 I thought that for reliable values for Rfree, one needs only to
 satisfy counting statistics, and therefore using at most a couple
 thousand reflections should always be sufficient. Almost always,
 however, some seemingly-arbitrary percentage of reflections is
 used, say 5%. Is there any rationale for using a percentage rather
 than some absolute number like 1000?
 
 All the best,
 
 Jacob
 
 *** Jacob Pearson Keller,
 PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
 Ashburn, VA 20147 email: kell...@janelia.hhmi.org 
 ***
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlRudacACgkQU5C0gGfAG13bNgCcCKISpiajkA8NbI+hzsuHdC1O
RaoAniJDw+hWuHuxkqGVF+qkWOHkynqi
=IH/N
-END PGP SIGNATURE-

Re: [ccp4bb] [phenixbb] Calculate average B-factor?

2014-10-06 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 10/6/2014 4:20 AM, Tim Gruene wrote:
 Dear Jose,
 
 the question came up again because I did not receive an answer to 
 my question. The thread discussed benefits and malefits of PDB vs. 
 mmCIF, which was not my question. This time, Nat Echolls gave a 
 very reasonable answer (at least for phenix) on the phenixbb,
 i.e., that there are no plans to abandon the PDB format (as
 working format), but very likely a smooth transition will take
 place - I guess this will be more slowly than the enforcement of
 the PDB to upload PDBx/mmCIF files for archiving. I agree that for
 archiving mmCIF is a reasonable format, but I guess less than 1% of
 all structures in the PDB hit the limits of the PDB format.
 
That's odd. I've found that just about every structure I've worked
on in the last couple decades has not been able to be expressed in the
PDB format without loss of information. A primary example? Try
expressing a pair of side chains that have alternative conformation in
a PDB file. Okay, one conformation is A and the other is B. That
allows me a total of twelve pairs of side chains before I run out of
upper case letters. Most people hack their model by reusing A and
B but of course that is ambiguous about where you mean the A's are
the same and where they are different. A realistic model of the
surface of a protein cannot be expressed in the PDB format.

How many models are refined with TLS B factors? There still is no
way to describe TLS in the PDB format. Don't tell me it's stuffed in
REMARK! What kind of a file format is that?

I believe that 100% of the models that we should be building can't
be described in the PDB file format, and that has been true for a
great many years.

Dale Tronrud

 I greatly appreciate Nat's answer and I would appreciate an answer 
 from the responsibles for the other refinement programs.
 
 Best, Tim
 
 On 10/05/2014 08:05 PM, Jose Manuel Duarte wrote:
 Thanks Frances for the explanation. Indeed mmCIF format is a lot 
 more complicated and grep can be a dangerous tool to use with 
 them. But for most cases it can do the job and thus it maintains 
 some sort of backwards compatibility. I can't agree more that 
 using specialised tools (for either PDB files or mmCIF files) 
 that deal with the formats properly is the best solution (see
 for instance http://mmcif.wwpdb.org/docs/software-resources.html
 for some of the mmCIF readers).
 
 In any case I find it most surprising that this topic came yet 
 again to this BB, when it was thoroughly discussed last year in 
 this thread:
 
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308L=ccp4bbD=0P=26939



 
I'm not sure why this kind of urban legends on the evilness of the mmCIF
 format keep coming back to the list...
 
 As explained there and elsewhere endless times, the PDB format is
 inadequate to represent the complexity of macromolecules and has
 been needing a replacement for a long time. The decision to move
 on to mmCIF has been made and in my opinion the sooner we move
 forward the better.
 
 Cheers
 
 Jose
 
 
 
 On 05.10.2014 15:52, Frances C. Bernstein wrote:
 mmCIF is a very general format with tag-value pairs, and loops
  so that tags do not need to be repeated endlessly.  It was 
 designed so that there is the flexibility of defining new terms
 easily and presenting the data in any order and with any kind
 of spacing.
 
 I understand that there are 10+ files in cyberspace 
 prepared by the PDB and that they all have the 'same' format.
 
 It is tempting to write software that treats these files as 
 fixed format and hope that all software packages that generate 
 coordinate files will use the same fixed format.  But that 
 loses the generality and flexibility of mmCIF, and software 
 written that way will fail when some field requires more 
 characters or a new field is added. There are software tools
 to allow one to read and extract data from any mmCIF file;
 using these is more complicated than using grep but using
 these assures that one's software will not fail when it
 encounters a date file that is not exactly what the PDB is
 currently producing.
 
 Note that mmCIf was defined when the limitations of the 
 fixed-format PDB format became apparent with large structures. 
 Let's not repeat the mistakes of the past.
 
 Frances
 
 =  
 Bernstein + Sons *   *   Information Systems Consultants 
 5 Brewster Lane, Bellport, NY 11713-2803 *   * ***
  *Frances C. Bernstein *   *** 
 f...@bernstein-plus-sons.com *** * *   *** 1-631-286-1339 
 FAX: 1-631-286-1999 
 =
 
 On Sun, 5 Oct 2014, Tim Gruene wrote:
 
 Hi Jose,
 
 I see. In the example on page 
 http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html,



 
it is in field 12, though, and I would have thought that mmCIF allows
 line breaks

Re: [ccp4bb] software or server to validate ligand density

2014-09-22 Thread Dale Tronrud

On 9/19/2014 2:29 PM, Eleanor Dodson wrote:
 hmm - crystallographically difficult. The usual way is to make a
 dictionary file from the chemical information about the ligand. try to
 build something obeying the chemical restraints into the density -
 refine those coordinates and validate them.
 Eleanor
 

   I would be a lot more cautious than this. It is good to remember that
the density for a fully occupied, ordered ligand will be as strong and
clear as that of the protein in the active site.  The fact that you have
done some refinement but are still not sure if your ligand, or anything
for that matter, has bound means that at best your crystal has problems.

   You have two questions.  Did anything bind during your soak?  Did the
ligand of interest bind?  The first you can attack fairly unambiguously
with an Fo(holo)-Fo(apo) map.  I presume you have an isomorphous apo
data set since you are performing a soaking experiment.  A clear set of
density that stands out above the background in the Fo-Fo map tells you
that something new is in your crystal.

   Figuring out what it is is another matter and is very difficult if
the density is weak.  Eleanor is recommending building and validating.
Unfortunately none of the validation tools we have have hard cutoffs of
what is acceptable and what is not.  With years of experience one can
come to a conclusion, but someone starting out doesn't really know if
those tests are definitive.  Remember you are not trying to decide if
your ligand bound, but if the thing that bound is your ligand.  There
are many other things that could happen.

   Maybe your ligand bound.  Maybe something else in your solution
bound.  Maybe the chemical in your solution wasn't what you thought it
was.  Maybe it was, but has been chemically transformed since then and
is now something else.  Maybe this particular crystal had something in
it all along that your apo crystal didn't.

   This isn't a matter of seeing IF you can build your ligand into the
density and get away with it.  You have to also say that you CAN'T build
anything (reasonable) in that density OTHER THAN your ligand.
With weak and fuzzy density this is very hard to do.

   If you performed a soak and got something to bind, maybe you could
soak longer or at higher concentration and get a more definitive map.

   If you have not solved the structure of a protein:ligand complex
before I suggest you check out other models in the PDB with good, strong
density and see what a full power ligand looks like in a map.  Only
settle for weak density if there is no alternative and never settle for
ambiguous density.

Dale Tronrud

 
 On 19 September 2014 22:06, ansuman biswas bubai_...@yahoo.co.in
 mailto:bubai_...@yahoo.co.in wrote:
 
 Dear All,
 
 I have collected a diffraction dataset from a crystal soaked in a
 solution containing the ligand of interest.
 After refining a few cycles, I can see some density in the active
 site pocket, but not so clear to model the ligand unambiguously.
 
 Is any tool available to validate whether the ligand is actually
 there or not ?
 The Twilight server appears to be for PDB files that have already
 been deposited.
 
 thanks and regards,
 Ansuman Biswas,
 dept. of Physics,
 Indian Institute of science

Re: [ccp4bb] Get NCS rotational axis from PDB file?

2014-09-22 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   While I'm sure the complex eigenvalues of a transformation matrix
are very interesting, I've never ran across a use for them in my work.
 They seem to have meaning in a universe where complex values for
coordinates are possible.  That universe would have three space-like
and three time-like dimensions.  I have enough problems with the
on-rush of one dimension of time.

Dale Tronrud.

On 9/22/2014 10:40 AM, Chen Zhao wrote:
 Hi Dale,
 
 Thank you for your reply! Yes, the real term of all eigenvalues are
 very close to 1, and as I said in the previous email they matches a
 2-fold rotational matrix. But I am just curious, would you mind if
 you give me a little bit of hint on what the imaginary space
 represent?
 
 Thank you so much, Chen
 
 On Mon, Sep 22, 2014 at 1:26 PM, Dale Tronrud
 de...@daletronrud.com mailto:de...@daletronrud.com wrote:
 
 
 On 9/22/2014 10:10 AM, Chen Zhao wrote:
 Hi all,
 
 I have a follow-up question here. I calculated the eigenvalues
 and the eigenvectors and some of them have imaginary terms. The
 real terms of the eigenvalues match a 2-fold rotation, but I am
 just wondering what the imaginary terms represent. It's been
 quite a while since I studied linear algebra.
 
 
 The eigenvalues for a rotation matrix will have one that is real 
 and the other two complex.  The eigenvector that corresponds to
 the real value is the rotation axis.  The other two are not useful
 for your purpose.
 
 By the way, that real eigenvalue has better be equal to one!
 
 Dale Tronrud
 
 Thank you so much, Chen
 
 On Mon, Sep 22, 2014 at 10:41 AM, Philip Kiser p...@case.edu
 mailto:p...@case.edu mailto:p...@case.edu
 mailto:p...@case.edu wrote:
 
 Cool. Glad to help.
 
 On Mon, Sep 22, 2014 at 10:34 AM, Chen Zhao c.z...@yale.edu
 mailto:c.z...@yale.edu mailto:c.z...@yale.edu
 mailto:c.z...@yale.edu wrote:
 
 Dear Philip,
 
 Please forgive me! Yes it is eigenvectors that I am looking for.
 I was deriving myself and came to the conclusion that
 R=A^(-1)R'A, but I just forgot it is eigenvectors, and I forgot
 what the eigenvector is originally for. Thank you so much!
 
 Sincerely, Chen
 
 On Mon, Sep 22, 2014 at 10:22 AM, Philip Kiser p...@case.edu
 mailto:p...@case.edu mailto:p...@case.edu
 mailto:p...@case.edu wrote:
 
 Hi Chen,
 
 Wouldn't the fold of the NCS be clear from the PDB file? You
 could use superpose to superimpose one monomer onto the next
 member of the NCS group, and then take the rotation matrix output
 from that program to calculate the eigenvectors for the
 transformation. The NCS axis is parallel to one of those
 eigenvectors.
 
 Philip
 
 Philip
 
 On Mon, Sep 22, 2014 at 10:13 AM, Chen Zhao c.z...@yale.edu
 mailto:c.z...@yale.edu mailto:c.z...@yale.edu
 mailto:c.z...@yale.edu wrote:
 
 Dear all,
 
 Is there a software that can print out the position and the fold
 of a NCS rotational axis from a PDB file? (just something like
 molrep self-rotation on a reflection file) I cannot use molrep
 because the RMSD between the two copies are too high, and I just
 want to cut a certain region for calculation. I don't know
 whether calculated Fc from the truncated PDB works in molrep
 self-RF, but I am thinking whether there is a more
 straightforward way.
 
 Thanks a lot in advance!
 
 Sincerely, Chen
 
 
 
 
 
 
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlQgkB8ACgkQU5C0gGfAG130YACdHjpeAi7AMKP3jHmF5n7F4eGh
i9oAoJEvpLZgBJMfypgu85B80cJ/zpNT
=9CJD
-END PGP SIGNATURE-

Re: [ccp4bb] Bond lengths and angles used by Molprobity for ANP (AMPPNP)

2014-09-15 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   In general the word outlier should not be read as error.  It
simply means that this thing is interesting.  You have to judge, based
on all information at hand, if it is interesting because it is rare or
because it is wrong.

   It is true that the rmsd of high resolution models tend to be
higher than those of low resolution.  The libraries we use are
distillations of high resolution structures and do not contain all of
the true variability of reality.  We have to use libraries when the
diffraction data is insufficient for the model to stand on its own,
but you have to keep in mind the library's limitations.

Dale Tronrud

On 9/15/2014 8:42 AM, C wrote:
 Isn't Molprobity too tight anyway for _high resolution_ structures?
 
 
 I have often (for proteins) found it highlighting outliers when
 in fact that is what the density shows.
 
 Isn't it a trade off between molprobity and what the measurements
 show, especially at high resolutions?
 
 
 -Original Message- *From:* merr...@u.washington.edu *Sent:*
 Mon, 15 Sep 2014 08:35:37 -0700 *To:* ccp4bb@jiscmail.ac.uk 
 *Subject:* Re: [ccp4bb] Bond lengths and angles used by Molprobity 
 for ANP (AMPPNP)
 
 On Monday, 15 September 2014 03:24:19 PM Andrew Leslie wrote:
 
 Does anyone know if Molprobity has recently changed the standard
 bond lengths and angles that it uses ?
 
 
 
 Molprobity is reporting errors in the C4-C5 bond length and the
 C5-C4-N3 bond angle (deviations of 8-10 sigma) for AMPPNP (monomer 
 code ANP) for a new structure refined with Refmac. I then tried 
 Molprobity on a deposited PDB structure that also contains AMPPNP 
 and it reported the same errors. I am sure that these errors were 
 not reported when this structure (2JDI) was deposited in 2007.
 
 
 
 This would suggest that the standard dictionary that Molprobity
 uses has changed, but I cannot find any reference to this on the 
 Molprobity pages.
 
 
 
 I would be very grateful if anyone can throw some light on this.
 
 
 
 I suspect, though I cannot say for sure, that the difference is
 
 whether ANP is [correctly] treated as a ligand or [incorrectly]
 
 treated as a monomer in a nucleic acid polymer. We have seen
 
 similar cases recently. For now I would say it is safe to
 
 disregard Molprobity scores for ligands that just happen to be
 
 nucleic acids.
 
 
 
 Ethan
 
 
 
 
 
 
 
 
 
 Thanks,
 
 
 
 Andrew
 
 
 
 --
 
 mail: Biomolecular Structure Center, K-428 Health Sciences Bldg
 
 MS 357742, University of Washington, Seattle 98195-7742
 
 

 
Can't remember your password? Do you need a strong and secure password?
 Use Password manager! It stores your passwords  protects your
 account. http://mysecurelogon.com/password-manager
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlQXCzoACgkQU5C0gGfAG13CJACcDqiw/U+UVU7+L8MJi1i5u0mp
NZwAnAzQTFAFStnpLuIcwfhgPK81F6aD
=gh5L
-END PGP SIGNATURE-

Re: [ccp4bb] TR: [ccp4bb] KRAS maps

2014-09-01 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 

 Hi Nick, Robbie,
 
 I encountered the same problem. This is not a bug but the way the
data was deposited.
 They have deposited intensities. What I did to be able to look at
the maps:
 Retrieve the structure factor file, which in fact contains
intensities
 Run it through Truncate to get Fs Remove the ligand from the PDB 
 REFMAC still refused to read in the new mtz file, for a reason I
could not find. But BUSTER took it without problem
 Then look at the electron density in COOT... if the ligand was
properly present, it should come clearly in the difference density
 
 
 Magali Mathieu Head of SB-X2S Sanofi LGCR - SDI Centre de Recherche
 de Vitry/Alfortville TÉL. : +33 (0)
1.58.93.39.90  -  FAX : +33 (0) 1.58.93.80.63 13, quai Jules Guesde –
BP 14 – 94403 Vitry-sur-Seine Cedex France
 

Dear Prof Mathieu,

   I don't think you have the right answer for these entries.  Many
entries in the PDB have intensities and the maps from the EDS are fine.
 I think the real problem lies at the end of the deposited mmCIFs.  When
you look there you see that each of these two entries have about 30,000
lines that look like

1 1 10   -1   11 x  ?   ?
1 1 10   -1   10 x  ?   ?
1 1 10   -19 x  ?   ?
1 1 10   -18 x  ?   ?

   These lines are odd because the x says that the reflection has a
weak or unreliably measured intensity while the ? says that the
intensity could be anything.  This is sort of an internal contradiction
and makes the intent of the CIFs creator a little unclear.

   More damning is the observation that the mtz that Coot downloads from
the EDS contains values for the Fourier coefficients for these very
reflections!  While their amplitudes are small, with 30,000 of them,
they add up.

   I have been waiting for my contact at the EDS to return to the
Internet to get a definitive word but he is still disconnected from the
world.  Your post makes it clear that Refmac is having problems with
these reflections while Buster ignores them.  I have done the map
calculation in TNT w/o these reflections (of course) and the maps look
fine.  I'm guessing that PDB_REDO junks these reflections before its
Refmac refinement and avoids the issue.

Dale Tronrud

On 9/1/2014 2:00 AM, Magali Mathieu wrote:

 
 
 Please consider the environment before printing this email!
 
 
 -Message d'origine- De : CCP4 bulletin board
 [mailto:CCP4BB@JISCMAIL.AC.UK] De la part de Robbie Joosten Envoyé
 : vendredi 29 août 2014 15:48 À : CCP4BB@JISCMAIL.AC.UK Objet : Re:
 [ccp4bb] KRAS maps
 
 Hi Nick,
 
 Not sure what happened here, it seems like a bug in this EDS entry.
 You can try PDB_REDO for maps. In recent versions of COOT (version
 0.8*) the button for getting maps and a model is just under that
 for EDS. There is a plugin for older versions of COOT. They look
 okay, but the model has changed (on purpose).
 
 If you prefer looking at maps for an unmodified model you can
 download the mtz files here: 
 http://www.cmbi.ru.nl/pdb_redo/lu/4luc/4luc_0cyc.mtz.gz 
 http://www.cmbi.ru.nl/pdb_redo/lv/4lv6/4lv6_0cyc.mtz.gz
 
 Cheers, Robbie
 
 
 
 -Original Message- From: CCP4 bulletin board
 [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Nicholas Larsen Sent:
 Friday, August 29, 2014 15:36 To: CCP4BB@JISCMAIL.AC.UK Subject:
 [ccp4bb] KRAS maps
 
 Dear All, I frequently use the Coot feature Fetch PDB and MAP
 using EDS... with great success when peer reviewing literature
 reports.  However, when I try this for the recent KRAS structures
 deposited by Kevan Shokat and Jim Wells (Nature 2013), the Coot
 generated maps are garbage, although the resolution is better
 than 1.5 A.  Does anyone have an explanation?   I also checked
 with one kind colleague at another institute and she confirmed my
 problem using Linux platform (I am using Windows).
 
 See PDBs, 4LUC and 4LV6, for example.
 
 Cheers, Nick
 
 
 
 [This e-mail message may contain privileged, confidential and/or
  proprietary information of H3 Biomedicine. If you believe that
 it has been sent to you in error, please contact the sender
 immediately and delete the message including any attachments,
 without copying, using, or distributing any of the information
 contained therein. This e-mail message should not be interpreted
 to include a digital or electronic signature that can be used to
 authenticate an agreement, contract or other legal document, nor
 to reflect an intention to be bound to any legally-binding
 agreement or contract.]
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlQEtugACgkQU5C0gGfAG12ERQCfVwu83cItNGOpTPF3q0jHDuHy
3MoAmwbb6/BTjnUi12BrsXBTHaQt9RcH
=RWw6
-END PGP SIGNATURE-

Re: [ccp4bb] Calculating anomalous Fourier maps

2014-08-29 Thread Dale Tronrud

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


   Here is how I view this stuff.  In general the diffraction pattern
does not obey Fridel's law, which means that the actual electron
density map is a set of complex numbers.  The map that consists of
just the real part of these density values is what we call the normal
electron density map.  The purely imaginary part is the anomalous map.
 We don't (yet) have FFT programs that calculate complex electron
density (although that is quite easy) nor do we have graphics programs
to display them, so we make due with two separate maps.  The normal
map is easy, just manipulate the structure factors to ensure
compliance with Fridel's law and calculate the map.

   The anomalous map is harder.  If you create coefficients that
purely disobey Fridel's law (F(h) = -F*(-h)) they will correspond to
the imaginary part of the total map, but our FFT programs cannot
calculate imaginary density.  If we note that the imaginary density is
just a real value multiplied by i, we can absorb the troublesome i
into the phase by recognizing that i = Exp[Pi/2 i] and rotate the
phase by 90 deg.

   This is nothing fundamental to diffraction.  It is just a trick to
get a program that is designed to calculate purely real values to
calculate a map of imaginary density.  Whether you rotate by +90 or
- -90 only changes the sign of the result.

   It would be really handy if Coot could take a general set of
structure factors, calculate the complex density and display,
separately, contours for the real and imaginary components of the
density.  Then, even if you didn't notice the presence of anomalous
scattering in your diffraction, you would still see the anomalous
peaks on your display.

Dale Tronrud

On 8/29/2014 11:43 AM, Alexander Aleshin wrote:
 Could anyone remind me how to calculate anomalous  difference
 Fourier maps using model-calculated phases? I was doing it by (1)
 calculating PHcalc from a pdb file using Sfall, then (2) merging
 PHcalc with Dano of experimental SFs, then (3) calculating a map
 with Dano and PHcalc using FFT program of CCP4.
 
 Now, I've read Z. Dauter's et all paper
 http://mcl1.ncifcrf.gov/dauter_pubs/175.pdf, and it said that their
 anomalous maps were calculated using (delF, PHcalc-90degrees). Why
 did they use  -90 degrees?  How does it relay to a (delF, phcalc)
 map?
 
 Thank you for an advice.
 
 Alex Aleshin
 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAlQA07QACgkQU5C0gGfAG100OgCfWJmiPBmTsm//P7VcqjKBcamm
ZZoAoIi6vAe0fQMLwIXaQGxAsGjIMe5v
=pky4
-END PGP SIGNATURE-

1 2 3 >

1 - 100 of 252 matches

Mail list logo