Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Hi Frank, I described in the previous e-mail the probabilistic interpretation of B-factors. In the case of very high uncertainty = poorly ordered side chains, I prefer to deposit the conformer representing maximum a posteriori, even if it does not represent all possible conformations. Maximum a posteriori will have significant contribution from the most probable conformation of side chain (prior knowledge) and should not conflict with likelihood (electron density map). Thus, in practice I model the most probable conformation as long as it it in even very weak electron density, does not overlap significantly with negative difference electron density and do not clash with other residues. If it's probability you're after, if there's no density to guide you (very common!) you'd have to place all likely rotamers that don't clash with anything, and set their occupancies to their probability (as encoded in the rotamer library). Which library? The one for all side chains of a specific type, or the one for a specific type with a given backbone conformation? These are quite different and change with the content of the PDB. 'Hacking' the occupancies is risky bussiness in general: errors are made quite easily. I frequently encounter side chains with partial occupancies but no alternatives, how can I relate this to the experimental date? Even worse, I also see cases where the occupancies of alternates sum up to values 1.00. What does that mean? Is that a local increase of DarmMatter accidentally encoded in the occupancy? This is now veering into data-free protein modeling territory... wasn't the idea to present to the downstream user an atomic representation of what the electron density shows us? Yes, but what we see can be deceiving. Worse, what we're also doing is encoding multiple different things in one place - what database people call poorly normalised, i.e. to understand a data field requires further parsing and if statements. In this case: to know whether there was no density, as end-user I'd have to have to second-guess what exactly those high-B-factor-variable-occupancy atoms mean. Until the PDB is expanded, the conventions need to be clear, and I thought they were: High B-factor == means atom is there but density is weak Atom missing == no density to support it. Unfortunately, it is not trivial to decide when there is 'no density'. We must have a good metric to do this, but I don't think it exists yet. Removing atoms is thus very subjective. This explaines why I frequently find positive difference density peaks near missing side chains. Leaving side chains in sometimes gives negative difference density but refining them with proper B-factor restrainsts reduces the problem a lot. There is still the problem of radiation damage, but that is relatively small. At least refining the B-factor is more reproducible and less subjective than making the binary choice to keep or remove an atom. Cheers, Robbie Oh well... phx.
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Hi Robbie If it's probability you're after, if there's no density to guide you (very common!) you'd have to place all likely rotamers that don't clash with anything, and set their occupancies to their probability (as encoded in the rotamer library). Which library? The one for all side chains of a specific type, or the one for a specific type with a given backbone conformation? These are quite different and change with the content of the PDB. 'Hacking' the occupancies is risky bussiness in general: errors are made quite easily. I frequently encounter side chains with partial occupancies but no alternatives, how can I relate this to the experimental date? Even worse, I also see cases where the occupancies of alternates sum up to values 1.00. What does that mean? Is that a local increase of DarmMatter accidentally encoded in the occupancy? Actually, I wasn't advocating it - I was taking ZO's suggestion to it's logical conclusion to point out the problem, namely deciding what is most likely. This you underline with your (very valid) question. Until the PDB is expanded, the conventions need to be clear, and I thought they were: High B-factor == means atom is there but density is weak Atom missing == no density to support it. Unfortunately, it is not trivial to decide when there is 'no density'. We must have a good metric to do this, but I don't think it exists yet. Removing atoms is thus very subjective. This explaines why I frequently find positive difference density peaks near missing side chains. Leaving side chains in sometimes gives negative difference density but refining them with proper B-factor restrainsts reduces the problem a lot. There is still the problem of radiation damage, but that is relatively small. At least refining the B-factor is more reproducible and less subjective than making the binary choice to keep or remove an atom. (Radiation damage is NOT a relatively small problem.) The fundamental problem remains: we're cramming too many meanings into one number. This the PDB could indeed solve, by giving us another column. (He said airily, blithely launching a totally new flame war.) phx.
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
The meaning of B-factor is the (scaled) sum of all positional uncertainties, and not just its one contributor, the Atomic Displacement Parameter that describes the relative displacement of an atom in the crystal lattice by a Gaussian function. That meaning (the sum of all contributions) comes from the procedure that calculates the B-factor in all PDB X-ray deposits, and not from an arbitrary decision by a committee. All programs that refine B-factors calculate an estimate of positional uncertainty, where contributors can be both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, the exact numerical contribution is rather a complex function, but conceptually it is still an uncertainty estimate. Given the resolution of the typical data, we do not have a procedure to decouple Gaussian and non-Gaussian contributors, so we have to live with the B-factor being defined by the refinement procedure. However, we should still improve the estimates of the B-factor, e.g. by changing the restraints. In my experience, the Refmac's default restraints on B-factors in side chains are too tight and I adjust them. Still, my preference would be to have harmonic restraints on U (square root of B) rather than on Bs themselves. It is not we who cram too many meanings on the B-factor, it is the quite fundamental limitation of crystallographic refinement. Zbyszek Otwinowski The fundamental problem remains: we're cramming too many meanings into one number [B factor]. This the PDB could indeed solve, by giving us another column. (He said airily, blithely launching a totally new flame war.) phx.
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
In my experience, the Refmac's default restraints on B-factors in side chains are too tight and I adjust them. Concur. See BMC p 640. BR
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
I'm not sure I entirely agree with ZO's assessment that a B factor is a measure of uncertainty. Pedantically, all it really is is an instruction to the refinement program to build some electron density with a certain width and height at a certain location. The result is then compared to the data, parameters are adjusted, etc. I don't think the B factor is somehow converted into an error bar on the calculated electron density, is it? For example, a B-factor of 500 on a carbon atom just means that the peak to build is ~0.02 electron/A^3 tall, and ~3 A wide (full width at half maximum). By comparison, a carbon with B=20 is 1.6 electrons/A^3 tall and ~0.7 A wide (FWHM). One of the bugs that Dale referred to is the fact that most refinement programs do not plot electron density more than 3 A away from each atomic center, so a substantial fraction of the 6 electrons represented by a carbon with B=500 will be sharply cut off, and missing from the FC calculation. Then again, all 6 electrons will be missing if the atoms are simply not modeled, or if the occupancy is zero. The point I am trying to make here is that there is no B factor that will make an atom go away, because the way B factors are implemented is to always conserve the total number of electrons in the atom, but just spread them out over more space. Now, a peak height of 0.02 electrons/A^3 may sound like it might as well be zero, especially when sitting next to a B=20 atom, but what if all the atoms have high B factors? For example, if the average (Wilson) B factor is 80 (like it typically is for a ~4A structure), then the average peak height of a carbon atom is 0.3 electrons/A^3, and then 0.02 electrons/A^3 starts to become more significant. If we consider a ~11 A structure, then the average atomic B factor will be around 500. This B vs resolution relationship is something I derived empirically from the PDB (Holton JSR 2009). Specifically, the average B factor for PDB files at a given resolution d is: B = 4*d^2+12. Admittedly, this is on average, but the trend does make physical sense: atoms with high B factors don't contribute very much to high-angle spots. More formally, the problem with using a high B-factor as a flag is that it is not resolution-general. Dale has already pointed this out. Personally, I prefer to think of B factors as a atom-by-atom resolution rather than an error bar, and this is how I tell students to interpret them (using the B = 4*d^2+12 formula). The problem I have with the error bar interpretation is that heterogeneity and uncertainty are not the same thing. That is, just because the atom is jumping around does not mean you don't know where the centroid of the distribution is. The u_x in B=8*pi^2*u_x^2 does reflect the standard error of atomic position in a GIVEN unit cell, but since we are averaging over trillions of cells, the error bar on the AVERAGE atomic position is actually a great deal smaller than u. I think this distinction is important because what we are building is a model of the AVERAGE electron density, not a single molecule. Just my 0.02 electrons -James Holton MAD Scientist On Fri, Apr 1, 2011 at 10:57 AM, Zbyszek Otwinowski zbys...@work.swmed.edu wrote: The meaning of B-factor is the (scaled) sum of all positional uncertainties, and not just its one contributor, the Atomic Displacement Parameter that describes the relative displacement of an atom in the crystal lattice by a Gaussian function. That meaning (the sum of all contributions) comes from the procedure that calculates the B-factor in all PDB X-ray deposits, and not from an arbitrary decision by a committee. All programs that refine B-factors calculate an estimate of positional uncertainty, where contributors can be both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, the exact numerical contribution is rather a complex function, but conceptually it is still an uncertainty estimate. Given the resolution of the typical data, we do not have a procedure to decouple Gaussian and non-Gaussian contributors, so we have to live with the B-factor being defined by the refinement procedure. However, we should still improve the estimates of the B-factor, e.g. by changing the restraints. In my experience, the Refmac's default restraints on B-factors in side chains are too tight and I adjust them. Still, my preference would be to have harmonic restraints on U (square root of B) rather than on Bs themselves. It is not we who cram too many meanings on the B-factor, it is the quite fundamental limitation of crystallographic refinement. Zbyszek Otwinowski The fundamental problem remains: we're cramming too many meanings into one number [B factor]. This the PDB could indeed solve, by giving us another column. (He said airily, blithely launching a totally new flame war.) phx.
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
In this case, I'm more on ZO's side. Let's say that the refinement program can't get an atom to the right position (for instance, to pick a reasonably realistic example, because you've put a leucine side chain in backwards). In that case, the B-factor for the atom nearest to where there should be one in the structure will get larger to smear out its density and put some in the right place. To a good approximation, the optimal increase in the B-factor will be the one you'd expect for a Gaussian probability distribution, i.e. 8Pi^2/3 times the positional error squared. So a refined B-factor does include a measure of the uncertainty or error in the atom's position. Best wishes, Randy Read On Apr 1 2011, James Holton wrote: I'm not sure I entirely agree with ZO's assessment that a B factor is a measure of uncertainty. Pedantically, all it really is is an instruction to the refinement program to build some electron density with a certain width and height at a certain location. The result is then compared to the data, parameters are adjusted, etc. I don't think the B factor is somehow converted into an error bar on the calculated electron density, is it? For example, a B-factor of 500 on a carbon atom just means that the peak to build is ~0.02 electron/A^3 tall, and ~3 A wide (full width at half maximum). By comparison, a carbon with B=20 is 1.6 electrons/A^3 tall and ~0.7 A wide (FWHM). One of the bugs that Dale referred to is the fact that most refinement programs do not plot electron density more than 3 A away from each atomic center, so a substantial fraction of the 6 electrons represented by a carbon with B=500 will be sharply cut off, and missing from the FC calculation. Then again, all 6 electrons will be missing if the atoms are simply not modeled, or if the occupancy is zero. The point I am trying to make here is that there is no B factor that will make an atom go away, because the way B factors are implemented is to always conserve the total number of electrons in the atom, but just spread them out over more space. Now, a peak height of 0.02 electrons/A^3 may sound like it might as well be zero, especially when sitting next to a B=20 atom, but what if all the atoms have high B factors? For example, if the average (Wilson) B factor is 80 (like it typically is for a ~4A structure), then the average peak height of a carbon atom is 0.3 electrons/A^3, and then 0.02 electrons/A^3 starts to become more significant. If we consider a ~11 A structure, then the average atomic B factor will be around 500. This B vs resolution relationship is something I derived empirically from the PDB (Holton JSR 2009). Specifically, the average B factor for PDB files at a given resolution d is: B = 4*d^2+12. Admittedly, this is on average, but the trend does make physical sense: atoms with high B factors don't contribute very much to high-angle spots. More formally, the problem with using a high B-factor as a flag is that it is not resolution-general. Dale has already pointed this out. Personally, I prefer to think of B factors as a atom-by-atom resolution rather than an error bar, and this is how I tell students to interpret them (using the B = 4*d^2+12 formula). The problem I have with the error bar interpretation is that heterogeneity and uncertainty are not the same thing. That is, just because the atom is jumping around does not mean you don't know where the centroid of the distribution is. The u_x in B=8*pi^2*u_x^2 does reflect the standard error of atomic position in a GIVEN unit cell, but since we are averaging over trillions of cells, the error bar on the AVERAGE atomic position is actually a great deal smaller than u. I think this distinction is important because what we are building is a model of the AVERAGE electron density, not a single molecule. Just my 0.02 electrons -James Holton MAD Scientist On Fri, Apr 1, 2011 at 10:57 AM, Zbyszek Otwinowski zbys...@work.swmed.edu wrote: The meaning of B-factor is the (scaled) sum of all positional uncertainties, and not just its one contributor, the Atomic Displacement Parameter that describes the relative displacement of an atom in the crystal lattice by a Gaussian function. That meaning (the sum of all contributions) comes from the procedure that calculates the B-factor in all PDB X-ray deposits, and not from an arbitrary decision by a committee. All programs that refine B-factors calculate an estimate of positional uncertainty, where contributors can be both Gaussian and non-Gaussian. For a non-Gaussian contributor, e.g. multiple occupancy, the exact numerical contribution is rather a complex function, but conceptually it is still an uncertainty estimate. Given the resolution of the typical data, we do not have a procedure to decouple Gaussian and non-Gaussian contributors, so we have to live with the B-factor being defined by the refinement procedure. However, we should still improve the estimates of the B-factor,
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Personally I think it is a _good_ thing that those missing atoms are a pain, because it helps ensure you are aware of the problem. As somebody who is in the business of supplying non-structural people with models, and seeing how those models are sometimes (mis)interpreted, I think it's better to inflict that pain than it is to present a model that non-structural people are likely to over-interpret. The PDB provides various manipulated versions of crystal structures, such as biological assemblies. I don't think it would necessarily be a bad idea to build missing atoms back into those sorts of processed files but for the main deposited entry the best way to make sure the model is not abused is to leave out atoms that can't be modeled accurately. Just as an example since you mention surfaces, some of the people I work with calculate solvent accessible surface areas of individual residues for purposes such as engineering cysteines for chemical conjugation, and if residues are modeled into bogus positions just to say all the atoms are there, software that calculates per-residue SASA has to have a reliable way of knowing to ignore those atoms when calculating the area of neighboring residues. Ad hoc solutions like putting very large values in the B column are not clear cut for such a software program to interpret. Leaving the atom out completely is pretty unambiguous. -Eric On Mar 31, 2011, at 7:34 PM, Scott Pegan wrote: I agree with Zbyszek with the modeling of side chains and stress the following points: 1) It drives me nuts when I find that PDB is missing atoms from side chains. This requires me to rebuild them to get any use out of the PDB such as relevant surface renderings or electropotential plots. I am an experienced structural biologist so that I can immediately identify that they have been removed and can rebuild them. I feel sorry for my fellow scientists from other biological fields that can't perform this task readability, thus removing these atoms from a model limits their usefulness to a wider scientific audience. 2) Not sure if any one has documented the percentage of actual side chains missing from radiation damage versus heterogeneity in confirmation (i.e. dissolved a crystal after collection and sent it to Mass Spec). Although the former likely happens occasionally, my gut tells me that the latter is significantly more predominant. As a result, absence of atoms from a side chain in the PDB where the main chain is clearly visible in the electron density might make for the best statistics for an experimental model, but does not reflect a reality. Scott
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
This is a lovely summary, and we should make our students read it. - But I'm afraid I do not see how it supports the closing statement in the last paragraph... phx. On 31/03/2011 17:06, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Dear Zbyszek: Thanks a lot for your good summary. It is very interesting but, do you think there are some references for more detailed description, especially from mathematics point of view about correlating B-factor to the Gaussian probability distribution (the B-factor unit of A^2 is my first doubt as for the probability distribution description)? Thanks again for your efforts! Best Regards, Hailiang The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
On Thursday, March 31, 2011 10:05:22 am Hailiang Zhang wrote: Dear Zbyszek: Thanks a lot for your good summary. It is very interesting but, do you think there are some references for more detailed description, especially from mathematics point of view about correlating B-factor to the Gaussian probability distribution (the B-factor unit of A^2 is my first doubt as for the probability distribution description)? Thanks again for your efforts! Best Regards, Hailiang I already cited the IUCr standard once, but here it is again: Trueblood, et al, 1996; Acta Cryst. A52, 770-781 http://dx.doi.org/10.1107/S0108767396005697 -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
While what you say here is quite true and is useful for us to remember, your list is quite short. I can add another 3) The systematic error introduced by assuming full occupancy for all sites. There are, of course, many other factors that we don't account for that our refinement programs tend to dump into the B factors. The definition of that number in the PDB file, as listed in the mmCIF dictionary, only includes your first factor -- http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html and these numbers are routinely interpreted as though that definition is the law. Certainly the whole basis of TLS refinement is that the B factors are really Atomic Displacement Parameters. In addition the stereochemical restraints on B factors are derived from the assumption that these parameters are ADPs. Convoluting all these other factors with the ADPs causes serious problems for those who analyze B factors as measures of motion. The fact that current refinement programs mix all these factors with the ADP for an atom to produce a vaguely defined B factor should be considered a flaw to be corrected and not an opportunity to pile even more factors into this field in the PDB file. Dale Tronrud On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Dale Tronrud wrote: While what you say here is quite true and is useful for us to remember, your list is quite short. I can add another 3) The systematic error introduced by assuming full occupancy for all sites. You are right that structural heterogeneity is an additional factor. Se-Met expression is one of the examples where the Se-Met residue is often not fully incorporated, and therefore its side chains have mixed with Met composition. Obviously, solvent molecules may have partial occupancies. Also, in heavily exposed crystals chemical reactions result in loss of the functional groups (e.g. by decarboxylation). However, in most cases even if side chains have multiple conformations their total occupancy is 1.0. There are, of course, many other factors that we don't account for that our refinement programs tend to dump into the B factors. The definition of that number in the PDB file, as listed in the mmCIF dictionary, only includes your first factor -- http://mmcif.rcsb.org/dictionaries/mmcif_std.dic/Items/_atom_site.B_iso_or_equiv.html and these numbers are routinely interpreted as though that definition is the law. Certainly the whole basis of TLS refinement is that the B factors are really Atomic Displacement Parameters. In addition the stereochemical restraints on B factors are derived from the assumption that these parameters are ADPs. Convoluting all these other factors with the ADPs causes serious problems for those who analyze B factors as measures of motion. The fact that current refinement programs mix all these factors with the ADP for an atom to produce a vaguely defined B factor should be considered a flaw to be corrected and not an opportunity to pile even more factors into this field in the PDB file. B-factors describe overall uncertainty of the current model. Refinement programs, which do not introduce or remove parts of the model (e.g. are not able to add additional conformations) intrinsically pile up all uncertainties into B-factors. Solutions, which you would like to see implemented, require a model-building like approach. The test of the success of such approach would be a substantial decrease of R-free values. If anybody can show it, it would be great. Zbyszek Dale Tronrud On 3/31/2011 9:06 AM, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet
Re: [ccp4bb] The meaning of B-factor, was Re: [ccp4bb] what to do with disordered side chains
Regarding the closing statement about the best solution to poorly ordered side chains: I described in the previous e-mail the probabilistic interpretation of B-factors. In the case of very high uncertainty = poorly ordered side chains, I prefer to deposit the conformer representing maximum a posteriori, even if it does not represent all possible conformations. Maximum a posteriori will have significant contribution from the most probable conformation of side chain (prior knowledge) and should not conflict with likelihood (electron density map). Thus, in practice I model the most probable conformation as long as it it in even very weak electron density, does not overlap significantly with negative difference electron density and do not clash with other residues. As a user of PDB files I much prefer the simplest and the most informative representation of the result. Removing parts of side chains that carry charges, as already mentioned, is not particularly helpful for the downstream uses. NMR-like deposits are not among my favorites, either. Having multiple conformations with low occupancies increases potential for a confusion, while benefits are not clear to me. Zbyszek Frank von Delft wrote: This is a lovely summary, and we should make our students read it. - But I'm afraid I do not see how it supports the closing statement in the last paragraph... phx. On 31/03/2011 17:06, Zbyszek Otwinowski wrote: The B-factor in crystallography represents the convolution (sum) of two types of uncertainties about the atom (electron cloud) position: 1) dispersion of atom positions in crystal lattice 2) uncertainty of the experimenter's knowledge about the atom position. In general, uncertainty needs not to be described by Gaussian function. However, communicating uncertainty using the second moment of its distribution is a widely accepted practice, with frequently implied meaning that it corresponds to a Gaussian probability function. B-factor is simply a scaled (by 8 times pi squared) second moment of uncertainty distribution. In the previous, long thread, confusion was generated by the additional assumption that B-factor also corresponds to a Gaussian probability distribution and not just to a second moment of any probability distribution. Crystallographic literature often implies the Gaussian shape, so there is some justification for such an interpretation, where the more complex probability distribution is represented by the sum of displaced Gaussians, where the area under each Gaussian component corresponds to the occupancy of an alternative conformation. For data with a typical resolution for macromolecular crystallography, such multi-Gaussian description of the atom position's uncertainty is not practical, as it would lead to instability in the refinement and/or overfitting. Due to this, a simplified description of the atom's position uncertainty by just the second moment of probability distribution is the right approach. For this reason, the PDB format is highly suitable for the description of positional uncertainties, the only difference with other fields being the unusual form of squaring and then scaling up the standard uncertainty. As this calculation can be easily inverted, there is no loss of information. However, in teaching one should probably stress more this unusual form of presenting the standard deviation. A separate issue is the use of restraints on B-factor values, a subject that probably needs a longer discussion. With respect to the previous thread, representing poorly-ordered (so called 'disordered') side chains by the most likely conformer with appropriately high B-factors is fully justifiable, and currently is probably the best solution to a difficult problem. Zbyszek Otwinowski - they all know what B is and how to look for regions of high B (with, say, pymol) and they know not to make firm conclusions about H-bonds to flaming red side chains. But this knowledge may be quite wrong. If the flaming red really indicates large vibrational motion then yes, one whould not bet on stable H-bonds. But if the flaming red indicates that a well-ordered sidechain was incorrectly modeled at full occupancy when in fact it is only present at half-occupancy then no, the H-bond could be strong but only present in that half-occupancy conformation. One presumes that the other half-occupancy location (perhaps missing from the model) would have its own H-bonding network. I beg to differ. If a side chain has 2 or more positions, one should be a bit careful about making firm conclusions based on only one of those, even if it isn't clear exactly why one should use caution. Also, isn't the isotropic B we fit at medium resolution more of a spherical cow approximation to physical reality anyway? Phoebe Zbyszek Otwinowski UT Southwestern Medical Center at Dallas 5323 Harry Hines Blvd. Dallas, TX 75390-8816 Tel. 214-645-6385 Fax. 214-645-6353 -- Zbyszek