Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Hi Francisco, I'll play devil's advocate, but a measurement without an estimate of its error is closer to theology than to science. The fact that the standard deviations are not used for calculating an electron density map via FFT is only due to the hidden assumption that you have 100% complete, error-free data set, extending to sufficient high (infinite) resolution. When these assumptions do not apply (as is usually the case with physical reality), then the simple-minded FFT is not the correct inversion procedure (and the data do not univocally define a single map). Under these conditions other inversion mathods are needed (such as maximum entropy) for which the standard deviations are actively being used for calculating the map. My twocents, Nicholas On Tue, 22 May 2012, Francisco Hernandez-Guzman wrote: Hello everyone, My apologies if this comes as basic, but I wanted to get the expert's take on whether or not the sigmaF values are required in the calculation of an electron density map. If I look at the standard ED equation, sigma's don't appear to be a requirement but all the scripts that I've looked at do require sigma values. I wanted to calculate the electron density for PDB id: 1HFS but the structure file only lists the Fo's, Fc's and Phases, but no sigmas. Would such structure factor file be considered incomplete? Thank you for your kind explanation. Francisco -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
I may be wrong here (and please by all means correct me), but I think it's not entirely true that experimental errors are not used in modern map calculation algorithm. At the very least, the 2mFo-DFc maps are calibrated to the model error (which can be ideologically seen as the error of experiment if you include model inaccuracies into that). And I quote from Acta D53:240, REFMAC also includes the sigma_E0 in the derivation of these terms (m and D) which usually leads to improved behavior. In fact in several cases when this has not been so it has been shown that the sigma_I0 were wrongly estimated during data processing. Thus, the experimental errors do affect the maps (albeit indirectly). I have not done extensive (or any for that matter) testing, but my evidence-devoid gut feeling is that maps not using experimental errors (which in REFAMC can be done either via gui button or by excluding SIGFP from LABIN in a script) will for a practicing crystallographer be essentially indistinguishable. The reason for this is that model errors as estimated by various maximum likelihood algorithms tend to exceed experimental errors. It may be that these estimates are inflated (heretical thought but when you think about it uniform inflation of the SIGMA_wc may have only proportional impact on the log-likelihood or even less so when they correlate with experimental errors). Or it may be that the experimental errors are underestimated (another heretical thought). Nevertheless, the perceived situation is that our models are not as good as our data, and therefore experimental errors don't matter. Now I am playing another devil's advocate and I know how crazy this sounds to an unbiased experimental scientist (e.g. if they don't matter, why bother improving data reduction algorithms?). I guess maps produced in phenix do not use experimental errors in any way given that the maximum likelihood formalism implemented there does not. Although phenix is not immutable and my understanding may be outdated. But this is not the right forum for pondering this specific question. Cheers, Ed. PS. I fully realize that Francisco's question was more practical (and the answer to that is to run REFMAC without SIGFP record in LABIN), but isn't thread-hijacking fun? :) On Wed, 2012-05-23 at 10:05 +0300, Nicholas M Glykos wrote: Hi Francisco, I'll play devil's advocate, but a measurement without an estimate of its error is closer to theology than to science. The fact that the standard deviations are not used for calculating an electron density map via FFT is only due to the hidden assumption that you have 100% complete, error-free data set, extending to sufficient high (infinite) resolution. When these assumptions do not apply (as is usually the case with physical reality), then the simple-minded FFT is not the correct inversion procedure (and the data do not univocally define a single map). Under these conditions other inversion mathods are needed (such as maximum entropy) for which the standard deviations are actively being used for calculating the map. My twocents, Nicholas On Tue, 22 May 2012, Francisco Hernandez-Guzman wrote: Hello everyone, My apologies if this comes as basic, but I wanted to get the expert's take on whether or not the sigmaF values are required in the calculation of an electron density map. If I look at the standard ED equation, sigma's don't appear to be a requirement but all the scripts that I've looked at do require sigma values. I wanted to calculate the electron density for PDB id: 1HFS but the structure file only lists the Fo's, Fc's and Phases, but no sigmas. Would such structure factor file be considered incomplete? Thank you for your kind explanation. Francisco -- I'd jump in myself, if I weren't so good at whistling. Julian, King of Lemurs
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Ed, I may be wrong here (and please by all means correct me), but I think it's not entirely true that experimental errors are not used in modern map calculation algorithm. At the very least, the 2mFo-DFc maps are calibrated to the model error (which can be ideologically seen as the error of experiment if you include model inaccuracies into that). And I supposed my statement may have more precise than helpful. Obviously model and experimental errors do factor into calculation of a 2mFo-DFc map - but is weight and structure factor calculation part of map calculation, or a distinct stage of data processing? I tend to think of it as separate from map calculation, but this may be up for debate (judging by the increasing number of statements along the lines of I looked at my mtz file in coot and saw X). [snip] Nevertheless, the perceived situation is that our models are not as good as our data, and therefore experimental errors don't matter. Now I am playing another devil's advocate and I know how crazy this sounds to an unbiased experimental scientist (e.g. if they don't matter, why bother improving data reduction algorithms?). The errors in our models are almost definitely more extensive than the errors in our measurements, but one try at answering this devil's advocate question would be to point out that the usual likelihood equations all require sigF (either as a component of sigma, or for bootstrapping sigma). I've only done limited testing related to this (it was actually for something else), but likelihood equations produce strange results if you try to get them to ignore sigF. Pete I guess maps produced in phenix do not use experimental errors in any way given that the maximum likelihood formalism implemented there does not. Although phenix is not immutable and my understanding may be outdated. But this is not the right forum for pondering this specific question. Cheers, Ed. PS. I fully realize that Francisco's question was more practical (and the answer to that is to run REFMAC without SIGFP record in LABIN), but isn't thread-hijacking fun? :) On Wed, 2012-05-23 at 10:05 +0300, Nicholas M Glykos wrote: Hi Francisco, I'll play devil's advocate, but a measurement without an estimate of its error is closer to theology than to science. The fact that the standard deviations are not used for calculating an electron density map via FFT is only due to the hidden assumption that you have 100% complete, error-free data set, extending to sufficient high (infinite) resolution. When these assumptions do not apply (as is usually the case with physical reality), then the simple-minded FFT is not the correct inversion procedure (and the data do not univocally define a single map). Under these conditions other inversion mathods are needed (such as maximum entropy) for which the standard deviations are actively being used for calculating the map. My twocents, Nicholas On Tue, 22 May 2012, Francisco Hernandez-Guzman wrote: Hello everyone, My apologies if this comes as basic, but I wanted to get the expert's take on whether or not the sigmaF values are required in the calculation of an electron density map. If I look at the standard ED equation, sigma's don't appear to be a requirement but all the scripts that I've looked at do require sigma values. I wanted to calculate the electron density for PDB id: 1HFS but the structure file only lists the Fo's, Fc's and Phases, but no sigmas. Would such structure factor file be considered incomplete? Thank you for your kind explanation. Francisco
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Hi Ed, I may be wrong here (and please by all means correct me), but I think it's not entirely true that experimental errors are not used in modern map calculation algorithm. At the very least, the 2mFo-DFc maps are calibrated to the model error (which can be ideologically seen as the error of experiment if you include model inaccuracies into that). This is an amplitude modification. It does not change the fact that the sigmas are not being used in the inversion procedure [and also does not change the (non) treatment of missing data]. A more direct and relevant example to discuss (with respect to Francisco's question) would be the calculation of a Patterson synthesis (where the phases are known and fixed). I have not done extensive (or any for that matter) testing, but my evidence-devoid gut feeling is that maps not using experimental errors (which in REFAMC can be done either via gui button or by excluding SIGFP from LABIN in a script) will for a practicing crystallographer be essentially indistinguishable. It seems that although you are not doubting the importance of maximum likelihood for refinement, you do seem to doubt the importance of closely related probabilistic methods (such as maximum entropy methods) for map calculation. I think you can't have it both ways ... :-) The reason for this is that model errors as estimated by various maximum likelihood algorithms tend to exceed experimental errors. It may be that these estimates are inflated (heretical thought but when you think about it uniform inflation of the SIGMA_wc may have only proportional impact on the log-likelihood or even less so when they correlate with experimental errors). Or it may be that the experimental errors are underestimated (another heretical thought). My experience from comparing conventional (FFT-based) and maximum-entropy- related maps is that the main source of differences between the two maps has more to do with missing data (especially low resolution overloaded reflections) and putative outliers (for difference Patterson maps), but in certain cases (with very accurate or inaccurate data) standard deviations do matter. All the best, Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Nicholas, My experience from comparing conventional (FFT-based) and maximum-entropy- related maps is that the main source of differences between the two maps has more to do with missing data (especially low resolution overloaded reflections) and putative outliers (for difference Patterson maps), but in certain cases (with very accurate or inaccurate data) standard deviations do matter. I'm curious - which programs have you used for maximum-entropy for map calculation? Pete
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
On Wed, 2012-05-23 at 10:02 -0500, Pete Meyer wrote: bviously model and experimental errors do factor into calculation of a 2mFo-DFc map - but is weight and structure factor calculation part of map calculation, or a distinct stage of data processing? Oh, I see. Sure, when the map coefficients are already available, the further calculation does need sigmas, or even Fo's/Fc's. I think the practical question Fransisco asked comes to this: If you have no SIGFP column in the input mtz-file, REFMAC will by default fail (phenix probably won't - at least I've seen datasets where all the sigmas were reset to 1.00 and it didn't change the output a bit). So one naturally things that sigmas are needed/used in map calculation. Again in practical terms, just remove the SIGFP from LABIN and it works, because refmac can do it both ways. Cheers, Ed. -- After much deep and profound brain things inside my head, I have decided to thank you for bringing peace to our home. Julian, King of Lemurs
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
On Wed, 2012-05-23 at 18:06 +0300, Nicholas M Glykos wrote: This is an amplitude modification. It does not change the fact that the sigmas are not being used in the inversion procedure Nicholas, I am not sure I understand this - perhaps we are talking about different things. Even if by inversion procedure you mean simple calculation of (2fo-fc)*exp(i*phi), the fc is still technically a product of the refinement, which unless based on trivial least square target (i.e. no weights) does factor in experimental errors. The (2mFo-DFc) map is even more obviously dependent on the errors. Again, I believe that the differences will be minor, but if one calculates a map with refmac either with or without factoring in experimental errors, there will be *some difference*. Thus, the experimental errors will affect the resulting map. Could you please clarify? Cheers, Ed -- Oh, suddenly throwing a giraffe into a volcano to make water is crazy? Julian, King of Lemurs
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
On Wed, 2012-05-23 at 18:06 +0300, Nicholas M Glykos wrote: It seems that although you are not doubting the importance of maximum likelihood for refinement, you do seem to doubt the importance of closely related probabilistic methods (such as maximum entropy methods) for map calculation. I think you can't have it both ways ... :-) Nicholas, I think that we are not comparing ML to no-ML (or maximum entropy), but rather ML inflated by experimental errors vs pure ML that ignores them. I may be crazy or stupid (or both), but certainly not crazy/stupid enough to doubt the importance of maximum likelihood for refinement. (On the other hand, one who promises to never doubt maximum likelihood shall never use SHELX :) Cheers, Ed. -- I don't know why the sacrifice thing didn't work. Science behind it seemed so solid. Julian, King of Lemurs
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
I just wanted to take a moment to thank all of the respondents to the post. Indeed, my question was more practical in nature since I wanted to see the density around the ligand in question. From the first suggestions, I quickly did manage to generate the maps and accomplish my goal (special thanks to Robbie for actually sending me the converted mtz file from the PDB cif entry). The additional comments have also been highly educational and helpful to further my understanding of some more in-depth crystallography concepts. Thank you, Francisco PS The PDB_REDO (http://www.cmbi.ru.nl/pdb_redo/hf/1hfs/index.html) was indeed a great resource and I'm certain to use it again. Thanks! From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Francisco Hernandez-Guzman Sent: Tuesday, May 22, 2012 9:28 AM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] Calculating ED Maps from structure factor files with no sigma Hello everyone, My apologies if this comes as basic, but I wanted to get the expert's take on whether or not the sigmaF values are required in the calculation of an electron density map. If I look at the standard ED equation, sigma's don't appear to be a requirement but all the scripts that I've looked at do require sigma values. I wanted to calculate the electron density for PDB id: 1HFS but the structure file only lists the Fo's, Fc's and Phases, but no sigmas. Would such structure factor file be considered incomplete? Thank you for your kind explanation. Francisco
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
On 05/23/12 08:06, Nicholas M Glykos wrote: Hi Ed, I may be wrong here (and please by all means correct me), but I think it's not entirely true that experimental errors are not used in modern map calculation algorithm. At the very least, the 2mFo-DFc maps are calibrated to the model error (which can be ideologically seen as the error of experiment if you include model inaccuracies into that). This is an amplitude modification. It does not change the fact that the sigmas are not being used in the inversion procedure [and also does not change the (non) treatment of missing data]. A more direct and relevant example to discuss (with respect to Francisco's question) would be the calculation of a Patterson synthesis (where the phases are known and fixed). I have not done extensive (or any for that matter) testing, but my evidence-devoid gut feeling is that maps not using experimental errors (which in REFAMC can be done either via gui button or by excluding SIGFP from LABIN in a script) will for a practicing crystallographer be essentially indistinguishable. It seems that although you are not doubting the importance of maximum likelihood for refinement, you do seem to doubt the importance of closely related probabilistic methods (such as maximum entropy methods) for map calculation. I think you can't have it both ways ... :-) The reason for this is that model errors as estimated by various maximum likelihood algorithms tend to exceed experimental errors. It may be that these estimates are inflated (heretical thought but when you think about it uniform inflation of the SIGMA_wc may have only proportional impact on the log-likelihood or even less so when they correlate with experimental errors). Or it may be that the experimental errors are underestimated (another heretical thought). My experience from comparing conventional (FFT-based) and maximum-entropy- related maps is that the main source of differences between the two maps has more to do with missing data (especially low resolution overloaded reflections) and putative outliers (for difference Patterson maps), but in certain cases (with very accurate or inaccurate data) standard deviations do matter. In a continuation of this torturous diversion from the original question... Since your concern is not how the sigma(Fo) plays out in refinement but how uncertainties are dealt with in the map calculation itself (where an FFT calculates the most probable density values and maximum entropy would calculate the best, or centroid, density values) I believe the most relevant measure of the uncertainty of the Fourier coefficients would be sigma(2mFo-DFc). This would be estimated from a complex calculation of sigma(sigmaA), sigma(Fo), sigma(Fc) and sigma(Phic). I expect that the contribution of sigma(Fo) would be one of the smallest contributors to this calculation, as long as Fo is observed. I wouldn't expect the loss of sigma(Fo) to be catastrophic. Wouldn't sigma(sigmaA) be the largest component since sigmaA is a function of resolution and based only on the test set? Dale Tronrud All the best, Nicholas
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Hi Ed, I am not sure I understand this - perhaps we are talking about different things. Even if by inversion procedure you mean simple calculation of (2fo-fc)*exp(i*phi), the fc is still technically a product of the refinement, which unless based on trivial least square target (i.e. no weights) does factor in experimental errors. The (2mFo-DFc) map is even more obviously dependent on the errors. Again, I believe that the differences will be minor, but if one calculates a map with refmac either with or without factoring in experimental errors, there will be *some difference*. Thus, the experimental errors will affect the resulting map. Could you please clarify? Yes, we are talking about different things. I refer to the case that we have an amplitude term with its uncertainty (no matter whether it is Fo or Fo^2 or Fo-Fc or 2mFo-DFc or ...) plus a phase with its uncertainty. In normal everyday applications we use FFT which ignores (i) the uncertainties of both terms, (ii) the missing data. By doing an FFT we produce a map which exactly reproduces the input data (even if they are missing data which are reproduced with an amplitude of zero). What I have been saying is that in the presence of uncertainties and missing information the data do not define a single map, but a whole set of maps which are statistically consistent with the data and the question then arises : 'which map should I be looking at ?'. I happen to mention the maximum entropy method as a possible solution to this problem. I think that we are not comparing ML to no-ML (or maximum entropy), but rather ML inflated by experimental errors vs pure ML that ignores them. I may be crazy or stupid (or both), but certainly not crazy/stupid enough to doubt the importance of maximum likelihood for refinement. (On the other hand, one who promises to never doubt maximum likelihood shall never use SHELX :) We definitely talk about different things. My arguments had nothing to do with treatment of errors in refinement. The question I was tackling was how you go from |F|,sig(|F|),phase to a map in the presence of errors and missing data. Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
In SHELXL. a refinement program sometimes used by small molecule crystallographers, all Fourier map for at least the last 20 years were weighted by Ic^2/(Ic^2+sigma^2(I)), where Ic is the calculated squared structure factor and sigma(I) is the square root of 1/w. w is the weight assigned to a reflection in the refinement (e.g. w=1/(sig(I)^2+(gI)^2), where sig(I) is the esd of the measured intensity I and g is a small constant. This purely empirical scheme appears to result in a significant reduction in the noise level of the map, at least for typical small molecule structures. Such schemes have been called 'maximum likelihood by intuition', a proper maximum likelihood treatment taking the esds of the intensities into account would of course do much better. George On 05/23/2012 06:59 PM, Dale Tronrud wrote: On 05/23/12 08:06, Nicholas M Glykos wrote: Hi Ed, I may be wrong here (and please by all means correct me), but I think it's not entirely true that experimental errors are not used in modern map calculation algorithm. At the very least, the 2mFo-DFc maps are calibrated to the model error (which can be ideologically seen as the error of experiment if you include model inaccuracies into that). This is an amplitude modification. It does not change the fact that the sigmas are not being used in the inversion procedure [and also does not change the (non) treatment of missing data]. A more direct and relevant example to discuss (with respect to Francisco's question) would be the calculation of a Patterson synthesis (where the phases are known and fixed). I have not done extensive (or any for that matter) testing, but my evidence-devoid gut feeling is that maps not using experimental errors (which in REFAMC can be done either via gui button or by excluding SIGFP from LABIN in a script) will for a practicing crystallographer be essentially indistinguishable. It seems that although you are not doubting the importance of maximum likelihood for refinement, you do seem to doubt the importance of closely related probabilistic methods (such as maximum entropy methods) for map calculation. I think you can't have it both ways ... :-) The reason for this is that model errors as estimated by various maximum likelihood algorithms tend to exceed experimental errors. It may be that these estimates are inflated (heretical thought but when you think about it uniform inflation of the SIGMA_wc may have only proportional impact on the log-likelihood or even less so when they correlate with experimental errors). Or it may be that the experimental errors are underestimated (another heretical thought). My experience from comparing conventional (FFT-based) and maximum-entropy- related maps is that the main source of differences between the two maps has more to do with missing data (especially low resolution overloaded reflections) and putative outliers (for difference Patterson maps), but in certain cases (with very accurate or inaccurate data) standard deviations do matter. In a continuation of this torturous diversion from the original question... Since your concern is not how the sigma(Fo) plays out in refinement but how uncertainties are dealt with in the map calculation itself (where an FFT calculates the most probable density values and maximum entropy would calculate the best, or centroid, density values) I believe the most relevant measure of the uncertainty of the Fourier coefficients would be sigma(2mFo-DFc). This would be estimated from a complex calculation of sigma(sigmaA), sigma(Fo), sigma(Fc) and sigma(Phic). I expect that the contribution of sigma(Fo) would be one of the smallest contributors to this calculation, as long as Fo is observed. I wouldn't expect the loss of sigma(Fo) to be catastrophic. Wouldn't sigma(sigmaA) be the largest component since sigmaA is a function of resolution and based only on the test set? Dale Tronrud All the best, Nicholas -- Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-22582
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Hi Pete, I'm curious - which programs have you used for maximum-entropy for map calculation? Thanks, I thought no-one would ask :-) http://utopia.duth.gr/~glykos/graphent.html Don't download the program today. Or tomorrow. This coming weekend there will be a new release which will contain MacOSX executables. Nicholas -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
Re: [ccp4bb] Calculating ED Maps from structure factor files with no sigma
Your understanding is correct, sigmaF values aren't required for calculating electron density. Many programs that calculate maps have an option to use the F/sigmaF ratio to threshold the amplitudes used in map calculation - which would require sigmaF. This isn't something I've seen used recently. The presence of sigF is also sometimes used as a proxy for confirming that the data is observed rather than calculated. Pete Francisco Hernandez-Guzman wrote: Hello everyone, My apologies if this comes as basic, but I wanted to get the expert’s take on whether or not the sigmaF values are required in the calculation of an electron density map. If I look at the standard ED equation, sigma’s don’t appear to be a requirement but all the scripts that I’ve looked at do require sigma values. I wanted to calculate the electron density for PDB id: 1HFS but the structure file only lists the Fo’s, Fc’s and Phases, but no sigmas. Would such structure factor file be considered incomplete? Thank you for your kind explanation. Francisco