Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Dear Graeme, some points for further clarification- The CORRECT corrections you mention all depend on the geometric description of the experiment. This geometric description of the experiment is refined by CORRECT, to come up with accurate values for a) application of polarization correction (which you were mentioning) b) application of the zeta factor which has to do with the the lengthening of the path of a reflection passing through the Ewald sphere at an angle c) intensity correction depending on finite detector thickness, detector material, wavelength, and angle (you were also mentioning this; it is the 1. item in the SILICON article in XDSwiki) d) positions of reflections on the surface of the detector. This is the second item in the SILICON article in XDSwiki. See also http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#SILICON= e) air absorption (http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#AIR=) Maybe I forgot something, but this may complete the picture somewhat. best, Kay On Mon, 17 Nov 2014 09:44:13 +, Graeme Winter graeme.win...@gmail.com wrote: Dear Nukri, The following is my opinion which I think is worth discussion, and are based on my understanding of what XDS does in the CORRECT step. Firstly, I tend to find the global refinement in the CORRECT step useful for getting a good unit cell recycling the orientation matrix etc. for reintegration. This is not related to scaling, but is useful, e.g.: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation#Re-INTEGRATEing_with_the_correct_spacegroup.2C_refined_geometry_and_fine-slicing_of_profiles More relevant to the intensities: in integration the LP correction is calculated assuming an unpolarized beam - if the data are from a synchrotron these need to be corrected again for the correct polarization - something which the correct step does (obviously given this on the command-line). Pointless will also do this but assumes unless given a correct value that the beam is quite polarized. Mostly: care needs to be taken, particularly if using a wavelength which may be confused with a lab source... I also understand that the XDS CORRECT step applies a DQE correction for Pilatus data, taking into account the geometry of the experiment, the sensor thickness photon energy. If you have a two theta offset and are using relatively high energy (say 14 keV or so?) then this may have odd effects on your data. At detector two theta = 0 this is less of a problem. This can be a gotcha with processing small molecule data recorded with a little Pilatus. Best wishes Graeme On Fri Nov 14 2014 at 6:15:31 PM Sanishvili, Ruslan rsanishv...@anl.gov wrote: Dear Graeme, Could you elaborate on There are also some subtleties to making (b) work properly... some more? I have a feeling, from observing the beamline users, that many choose to use this option. It would be very helpful for them to know what are those subtleties and how to best make it work properly. Many thanks, Nukri Ruslan Sanishvili (Nukri) Macromolecular Crystallographer GM/CA@APS X-ray Science Division, ANL 9700 S. Cass Ave. Lemont, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -- *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Graeme Winter [graeme.win...@gmail.com] *Sent:* Thursday, November 13, 2014 2:15 AM *To:* CCP4BB@JISCMAIL.AC.UK *Subject:* Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Dear Nukri, The following is my opinion which I think is worth discussion, and are based on my understanding of what XDS does in the CORRECT step. Firstly, I tend to find the global refinement in the CORRECT step useful for getting a good unit cell recycling the orientation matrix etc. for reintegration. This is not related to scaling, but is useful, e.g.: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation#Re-INTEGRATEing_with_the_correct_spacegroup.2C_refined_geometry_and_fine-slicing_of_profiles More relevant to the intensities: in integration the LP correction is calculated assuming an unpolarized beam - if the data are from a synchrotron these need to be corrected again for the correct polarization - something which the correct step does (obviously given this on the command-line). Pointless will also do this but assumes unless given a correct value that the beam is quite polarized. Mostly: care needs to be taken, particularly if using a wavelength which may be confused with a lab source... I also understand that the XDS CORRECT step applies a DQE correction for Pilatus data, taking into account the geometry of the experiment, the sensor thickness photon energy. If you have a two theta offset and are using relatively high energy (say 14 keV or so?) then this may have odd effects on your data. At detector two theta = 0 this is less of a problem. This can be a gotcha with processing small molecule data recorded with a little Pilatus. Best wishes Graeme On Fri Nov 14 2014 at 6:15:31 PM Sanishvili, Ruslan rsanishv...@anl.gov wrote: Dear Graeme, Could you elaborate on There are also some subtleties to making (b) work properly... some more? I have a feeling, from observing the beamline users, that many choose to use this option. It would be very helpful for them to know what are those subtleties and how to best make it work properly. Many thanks, Nukri Ruslan Sanishvili (Nukri) Macromolecular Crystallographer GM/CA@APS X-ray Science Division, ANL 9700 S. Cass Ave. Lemont, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -- *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Graeme Winter [graeme.win...@gmail.com] *Sent:* Thursday, November 13, 2014 2:15 AM *To:* CCP4BB@JISCMAIL.AC.UK *Subject:* Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes a) INTEGRATE - CORRECT scaling - SHELXD b) INTEGRATE - AIMLESS scaling - SHELXD c) INTEGRATE - CORRECT+AIMLESS scaling - SHELXD d) INTEGRATE - CORRECT but scaling switched off - AIMLESS scaling - SHELXD e) INTEGRATE - CORRECT scaling - AIMLESS but scaling switched off - SHELXD and report here. You can add XSCALE into the mix but that won't change the
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Actually Pointless knows that the INTEGRATE file is corrected for an unpolarised beam and recorrects for a synchrotron unless the wavelength is one of the home source ones. See docs. You can specify explicitly I think Phil Sent from my iPhone On 17 Nov 2014, at 09:44, Graeme Winter graeme.win...@gmail.com wrote: Dear Nukri, The following is my opinion which I think is worth discussion, and are based on my understanding of what XDS does in the CORRECT step. Firstly, I tend to find the global refinement in the CORRECT step useful for getting a good unit cell recycling the orientation matrix etc. for reintegration. This is not related to scaling, but is useful, e.g.: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation#Re-INTEGRATEing_with_the_correct_spacegroup.2C_refined_geometry_and_fine-slicing_of_profiles More relevant to the intensities: in integration the LP correction is calculated assuming an unpolarized beam - if the data are from a synchrotron these need to be corrected again for the correct polarization - something which the correct step does (obviously given this on the command-line). Pointless will also do this but assumes unless given a correct value that the beam is quite polarized. Mostly: care needs to be taken, particularly if using a wavelength which may be confused with a lab source... I also understand that the XDS CORRECT step applies a DQE correction for Pilatus data, taking into account the geometry of the experiment, the sensor thickness photon energy. If you have a two theta offset and are using relatively high energy (say 14 keV or so?) then this may have odd effects on your data. At detector two theta = 0 this is less of a problem. This can be a gotcha with processing small molecule data recorded with a little Pilatus. Best wishes Graeme On Fri Nov 14 2014 at 6:15:31 PM Sanishvili, Ruslan rsanishv...@anl.gov wrote: Dear Graeme, Could you elaborate on There are also some subtleties to making (b) work properly... some more? I have a feeling, from observing the beamline users, that many choose to use this option. It would be very helpful for them to know what are those subtleties and how to best make it work properly. Many thanks, Nukri Ruslan Sanishvili (Nukri) Macromolecular Crystallographer GM/CA@APS X-ray Science Division, ANL 9700 S. Cass Ave. Lemont, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Graeme Winter [graeme.win...@gmail.com] Sent: Thursday, November 13, 2014 2:15 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
HI Phil, Yep: I was not clear perhaps but I did say this: Pointless will also do this but assumes unless given a correct value that the beam is quite polarized. Mostly: care needs to be taken, particularly if using a wavelength which may be confused with a lab source... For others: if you do wish to set the polarization exactly the keyword is polarization: POLARISATION [XDS | MOSFLM] [polarisation_factor] INTEGRATE.HKL files from XDS have been corrected for the polarisation from an unpolarised incident beam, but not for the additional polarisation correction from a synchrotron source, so this additional correction needs to be applied, and will be applied by default for this file type. This command has no effect on other types of input files. There seem to be two definitions for the polarisation ratio polarisation_factor: the definition used in eg Mosflm, which follows Kahn et al. (ref below), J' in their Appendix: this has a value of 0 for an unpolarised beam and +1.0 for a fully polarised synchrotron beam the definition used in XDS, parameter FRACTION_OF_POLARIZATION: this has a value 0.5 for an unpolarised beam and +1.0 for a fully polarised synchrotron beam Here the value given is assumed to be in the XDS convention, unless the subkeyword MOSFLM is given, and it is then converted to the Kahn/Mosflm convention for internal use. Set polarisation_factor = 0.0 for an unpolarised beam. The default value if not set explicitly = 0.99, = XDS 0.98, unless the wavelength corresponds to a likely in-house source, in which case the unpolarised value is left unchanged (recognised wavelengths are CuKalpha 1.5418 +- 0.0019, Mo 0.7107 +- 0.0002, Cr 2.29 +- 0.01) (Reference: Kahn, Fourme, Gadet, Janin, Dumas, André, J. Appl. Cryst. (1982). 15, 330-337) From: http://www.ccp4.ac.uk/html/pointless.html#polarisation Cheerio Graeme On Mon Nov 17 2014 at 10:22:56 AM Phil Evans p...@mrc-lmb.cam.ac.uk wrote: Actually Pointless knows that the INTEGRATE file is corrected for an unpolarised beam and recorrects for a synchrotron unless the wavelength is one of the home source ones. See docs. You can specify explicitly I think Phil Sent from my iPhone On 17 Nov 2014, at 09:44, Graeme Winter graeme.win...@gmail.com wrote: Dear Nukri, The following is my opinion which I think is worth discussion, and are based on my understanding of what XDS does in the CORRECT step. Firstly, I tend to find the global refinement in the CORRECT step useful for getting a good unit cell recycling the orientation matrix etc. for reintegration. This is not related to scaling, but is useful, e.g.: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation#Re-INTEGRATEing_with_the_correct_spacegroup.2C_refined_geometry_and_fine-slicing_of_profiles More relevant to the intensities: in integration the LP correction is calculated assuming an unpolarized beam - if the data are from a synchrotron these need to be corrected again for the correct polarization - something which the correct step does (obviously given this on the command-line). Pointless will also do this but assumes unless given a correct value that the beam is quite polarized. Mostly: care needs to be taken, particularly if using a wavelength which may be confused with a lab source... I also understand that the XDS CORRECT step applies a DQE correction for Pilatus data, taking into account the geometry of the experiment, the sensor thickness photon energy. If you have a two theta offset and are using relatively high energy (say 14 keV or so?) then this may have odd effects on your data. At detector two theta = 0 this is less of a problem. This can be a gotcha with processing small molecule data recorded with a little Pilatus. Best wishes Graeme On Fri Nov 14 2014 at 6:15:31 PM Sanishvili, Ruslan rsanishv...@anl.gov wrote: Dear Graeme, Could you elaborate on There are also some subtleties to making (b) work properly... some more? I have a feeling, from observing the beamline users, that many choose to use this option. It would be very helpful for them to know what are those subtleties and how to best make it work properly. Many thanks, Nukri Ruslan Sanishvili (Nukri) Macromolecular Crystallographer GM/CA@APS X-ray Science Division, ANL 9700 S. Cass Ave. Lemont, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov -- *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Graeme Winter [graeme.win...@gmail.com] *Sent:* Thursday, November 13, 2014 2:15 AM *To:* CCP4BB@JISCMAIL.AC.UK *Subject:* Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Dear Graeme, Could you elaborate on There are also some subtleties to making (b) work properly... some more? I have a feeling, from observing the beamline users, that many choose to use this option. It would be very helpful for them to know what are those subtleties and how to best make it work properly. Many thanks, Nukri Ruslan Sanishvili (Nukri) Macromolecular Crystallographer GM/CA@APS X-ray Science Division, ANL 9700 S. Cass Ave. Lemont, IL 60439 Tel: (630)252-0665 Fax: (630)252-0667 rsanishv...@anl.gov From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Graeme Winter [graeme.win...@gmail.com] Sent: Thursday, November 13, 2014 2:15 AM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.demailto:kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes a) INTEGRATE - CORRECT scaling - SHELXD b) INTEGRATE - AIMLESS scaling - SHELXD c) INTEGRATE - CORRECT+AIMLESS scaling - SHELXD d) INTEGRATE - CORRECT but scaling switched off - AIMLESS scaling - SHELXD e) INTEGRATE - CORRECT scaling - AIMLESS but scaling switched off - SHELXD and report here. You can add XSCALE into the mix but that won't change the picture, since it does the exact same calculations for multiple datasets as CORRECT does for single datasets. Personally, I don't understand why people would _want_ to do c),d) or e) because that's just added complexity, and additional sources of error. I'm looking forward to the results of such studies! Kay On Wed, 12 Nov 2014 12:41:28 -0500, wtempel wtem...@gmail.commailto:wtem...@gmail.com wrote: Hello Kay, you said the o-word, and you are familiar with the inner workings of XDS. Has the data-to-parameter ratio in even complex scaling models become so small that a doubling (worst case) of model parameters would be a serious concern? Could one detect such overfitting by, say, comparing (molecular) model R-factors between refinement against the once (CORRECT) scaled or twice (CORRECT+AIMLESS) scaled data? Thank you, Wolfram On Wed, Nov 12, 2014 at 10:32 AM, Kay Diederichs kay.diederi...@uni-konstanz.demailto:kay.diederi...@uni-konstanz.de wrote: Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes a) INTEGRATE - CORRECT scaling - SHELXD b) INTEGRATE - AIMLESS scaling - SHELXD c) INTEGRATE - CORRECT+AIMLESS scaling - SHELXD d) INTEGRATE - CORRECT but scaling switched off - AIMLESS scaling - SHELXD e) INTEGRATE - CORRECT scaling - AIMLESS but scaling switched off - SHELXD and report here. You can add XSCALE into the mix but that won't change the picture, since it does the exact same calculations for multiple datasets as CORRECT does for single datasets. Personally, I don't understand why people would _want_ to do c),d) or e) because that's just added complexity, and additional sources of error. I'm looking forward to the results of such studies! Kay On Wed, 12 Nov 2014 12:41:28 -0500, wtempel wtem...@gmail.com wrote: Hello Kay, you said the o-word, and you are familiar with the inner workings of XDS. Has the data-to-parameter ratio in even complex scaling models become so small that a doubling (worst case) of model parameters would be a serious concern? Could one detect such overfitting by, say, comparing (molecular) model R-factors between refinement against the once (CORRECT) scaled or twice (CORRECT+AIMLESS) scaled data? Thank you, Wolfram On Wed, Nov 12, 2014 at 10:32 AM, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output files. best, Kay On Wed, 12 Nov 2014 11:22:51 +0100, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Wolfram Tempel, there might be some confusion about terms. It is correct that xscale scales several data sets together. However, in crystallography, 'merging' might be the better term for this process. Crystallographic 'Scaling' is far more complicated than 'merging'. It applies correction factors which try to make up for experimental errors in your data set. These corrections include the sigma-values, which is particularly important for experimental phasing. In that respect it can actually hamper the data quality if you (crystallographically) scale your data twice, although the effect is rather subtle. CORRECT carries out these corrections, hence
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Dear Graeme, good that you set this straight. I consider getting the statistics output from AIMLESS is a perfectly valid reason for going e), and as long as this is well-tested (which I'd bet in case of xia2) it's ok. There is one issue I can see: 99% (obviously my guess could be wrong; just an estimate based on reading the Methods section of papers) of xia2 -3d users are not aware that their data then are _not_ scaled by AIMLESS. They see the AIMLESS tables and think so it must have been AIMLESS that scaled the data. And they publish and PDB-deposit their misconception. This is how the misunderstanding spreads, which is then why I get asked can CORRECT scale a data set? and other misunderstandings along these lines ... best, Kay On Thu, 13 Nov 2014 08:15:12 +, Graeme Winter graeme.win...@gmail.com wrote: Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes a) INTEGRATE - CORRECT scaling - SHELXD b) INTEGRATE - AIMLESS scaling - SHELXD c) INTEGRATE - CORRECT+AIMLESS scaling - SHELXD d) INTEGRATE - CORRECT but scaling switched off - AIMLESS scaling - SHELXD e) INTEGRATE - CORRECT scaling - AIMLESS but scaling switched off - SHELXD and report here. You can add XSCALE into the mix but that won't change the picture, since it does the exact same calculations for multiple datasets as CORRECT does for single datasets. Personally, I don't understand why people would _want_ to do c),d) or e) because that's just added complexity, and additional sources of error. I'm looking forward to the results of such studies! Kay On Wed, 12 Nov 2014 12:41:28 -0500, wtempel wtem...@gmail.com wrote: Hello Kay, you said the o-word, and you are familiar with the inner workings of XDS. Has the data-to-parameter ratio in even complex scaling models become so small that a doubling (worst case) of model parameters would be a serious concern? Could one detect such overfitting by, say, comparing (molecular) model R-factors between refinement against the once (CORRECT) scaled or twice (CORRECT+AIMLESS) scaled data? Thank you, Wolfram On Wed, Nov 12, 2014 at 10:32 AM, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output files. best, Kay On
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Dear Kay I cannot comment on the accuracy or otherwise of your 99%, but every time I talk about xia2 or write down what the options do, I try to make it clear that XDS / XSCALE is used for integration scaling then AIMLESS to merge the data. I have had an interest for a while in scaling the data with AIMLESS from INTEGRATE.HKL purely for the purpose of performing the analysis you described, but this would be a different option to xia2 *which does not yet exist* If you have a way of avoiding misconceptions in users I am sure I will not be alone in my interest :o) and on a more practical note if you think the description of how xia2 uses XDS / XSCALE can be improved I would welcome that. It does always list the appropriate references for users to cite at the end... Best wishes Graeme On Thu Nov 13 2014 at 8:35:20 AM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Dear Graeme, good that you set this straight. I consider getting the statistics output from AIMLESS is a perfectly valid reason for going e), and as long as this is well-tested (which I'd bet in case of xia2) it's ok. There is one issue I can see: 99% (obviously my guess could be wrong; just an estimate based on reading the Methods section of papers) of xia2 -3d users are not aware that their data then are _not_ scaled by AIMLESS. They see the AIMLESS tables and think so it must have been AIMLESS that scaled the data. And they publish and PDB-deposit their misconception. This is how the misunderstanding spreads, which is then why I get asked can CORRECT scale a data set? and other misunderstandings along these lines ... best, Kay On Thu, 13 Nov 2014 08:15:12 +, Graeme Winter graeme.win...@gmail.com wrote: Dear Kay Just to comment on (e) since you say you don't know why anyone would want to do this, yet this is exactly what xia2 -3d does :o) I use AIMLESS to merge data already scaled by XDS CORRECT or XSCALE as a way to get a report on the merging statistics which includes all of the AIMLESS analysis, and to generate harvesting files for deposition. Like you, I look forward to studies of (a) - (e) think of all of these (c) is by far the worst idea, from gut instinct. There are also some subtleties to making (b) work properly... For anyone who has time on their hands would like to do this study, be sure to consider a range of crystal symmetries as it is possible that some strategies which are safe in PG 422 (say) are not in PG 2. Best wishes Graeme On Wed Nov 12 2014 at 10:07:10 PM Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes a) INTEGRATE - CORRECT scaling - SHELXD b) INTEGRATE - AIMLESS scaling - SHELXD c) INTEGRATE - CORRECT+AIMLESS scaling - SHELXD d) INTEGRATE - CORRECT but scaling switched off - AIMLESS scaling - SHELXD e) INTEGRATE - CORRECT scaling - AIMLESS but scaling switched off - SHELXD and report here. You can add XSCALE into the mix but that won't change the picture, since it does the exact same calculations for multiple datasets as CORRECT does for single datasets. Personally, I don't understand why people would _want_ to do c),d) or e) because that's just added complexity, and additional sources of error. I'm looking forward to the results of such studies! Kay On Wed, 12 Nov 2014 12:41:28 -0500, wtempel wtem...@gmail.com wrote: Hello Kay, you said the o-word, and you are familiar with the inner workings of XDS. Has the data-to-parameter ratio in even complex scaling models become so small that a doubling (worst case) of model parameters would be a serious concern? Could
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Wolfram Tempel, there might be some confusion about terms. It is correct that xscale scales several data sets together. However, in crystallography, 'merging' might be the better term for this process. Crystallographic 'Scaling' is far more complicated than 'merging'. It applies correction factors which try to make up for experimental errors in your data set. These corrections include the sigma-values, which is particularly important for experimental phasing. In that respect it can actually hamper the data quality if you (crystallographically) scale your data twice, although the effect is rather subtle. CORRECT carries out these corrections, hence CORRECT scales your data set, while XSCALE does not repeat this step - it only merges your data in the sense that it puts your data on a common scale. This is the application of a not too difficult mathematical formula (which is listed in the xds wiki, but I don't remember the URL). Regards, Tim On 11/11/2014 10:07 PM, Sudhir Babu Pothineni wrote: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Xscale XSCALE http://www.mpimf-heidelberg.mpg.de/%7Ekabsch/xds/html_doc/xscale_parameters.html is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only /needed/ if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is scaled again in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells. _Scaling with scala/aimless_ http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Scaling_with_SCALA_%28or_better:_aimless%29 -Sudhir *** Sudhir Babu Pothineni GM/CA @ APS 436D Argonne National Laboratory 9700 S Cass Ave Argonne IL 60439 Ph : 630 252 0672 On 11/11/14 14:42, wtempel wrote: Thank you Boaz. So if CORRECT can do a fully corrected scaling, are there no corrections that XSCALE might apply to XDS_ASCII.HKL data that are beyond CORRECT's capabilities? Wolfram On Tue, Nov 11, 2014 at 3:05 PM, Boaz Shaanan bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il wrote: Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz /Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 / // // / / *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [wtem...@gmail.com mailto:wtem...@gmail.com] *Sent:* Tuesday, November 11, 2014 9:50 PM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Subject:* [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1307L=CCP4BBH=1P=186901 on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iD8DBQFUYzT7UxlJ7aRr7hoRAuO2AJ9P3kJAjP+8wWjXRvkZwgDs9UOo3ACfb1En 67VgyyqCTX6j5vOz3xMVwqE= =ooTC -END PGP SIGNATURE-
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output files. best, Kay On Wed, 12 Nov 2014 11:22:51 +0100, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Wolfram Tempel, there might be some confusion about terms. It is correct that xscale scales several data sets together. However, in crystallography, 'merging' might be the better term for this process. Crystallographic 'Scaling' is far more complicated than 'merging'. It applies correction factors which try to make up for experimental errors in your data set. These corrections include the sigma-values, which is particularly important for experimental phasing. In that respect it can actually hamper the data quality if you (crystallographically) scale your data twice, although the effect is rather subtle. CORRECT carries out these corrections, hence CORRECT scales your data set, while XSCALE does not repeat this step - it only merges your data in the sense that it puts your data on a common scale. This is the application of a not too difficult mathematical formula (which is listed in the xds wiki, but I don't remember the URL). Regards, Tim On 11/11/2014 10:07 PM, Sudhir Babu Pothineni wrote: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Xscale XSCALE http://www.mpimf-heidelberg.mpg.de/%7Ekabsch/xds/html_doc/xscale_parameters.html is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only /needed/ if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is scaled again in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells. _Scaling with scala/aimless_ http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Scaling_with_SCALA_%28or_better:_aimless%29 -Sudhir *** Sudhir Babu Pothineni GM/CA @ APS 436D Argonne National Laboratory 9700 S Cass Ave Argonne IL 60439 Ph : 630 252 0672 On 11/11/14 14:42, wtempel wrote: Thank you Boaz. So if CORRECT can do a fully corrected scaling, are there no corrections that XSCALE might apply to XDS_ASCII.HKL data that are beyond CORRECT's capabilities? Wolfram On Tue, Nov 11, 2014 at 3:05 PM, Boaz Shaanan bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il wrote: Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz /Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 / // // / / *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [wtem...@gmail.com mailto:wtem...@gmail.com] *Sent:* Tuesday, November 11, 2014 9:50 PM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Subject:* [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1307L=CCP4BBH=1P=186901 on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel - -- - -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iD8DBQFUYzT7UxlJ7aRr7hoRAuO2AJ9P3kJAjP+8wWjXRvkZwgDs9UOo3ACfb1En 67VgyyqCTX6j5vOz3xMVwqE= =ooTC -END PGP SIGNATURE-
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
On Wed, 12 Nov 2014 15:32:04 +, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: ... It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Maybe needless to add, but I'll write it nevertheless. XSCALE _also_ adjust the error model in this step, and adjusts the sigmas accordingly. Kay
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Hi Kay, thank you for the clarification. I had understood that using XSCALE after CORRECT does no harm, but did not understand that the reason lies in the consistent choice of support points rather than not repeating what might already having been done. Regards, Tim On 11/12/2014 04:32 PM, Kay Diederichs wrote: Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output files. best, Kay On Wed, 12 Nov 2014 11:22:51 +0100, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: Dear Wolfram Tempel, there might be some confusion about terms. It is correct that xscale scales several data sets together. However, in crystallography, 'merging' might be the better term for this process. Crystallographic 'Scaling' is far more complicated than 'merging'. It applies correction factors which try to make up for experimental errors in your data set. These corrections include the sigma-values, which is particularly important for experimental phasing. In that respect it can actually hamper the data quality if you (crystallographically) scale your data twice, although the effect is rather subtle. CORRECT carries out these corrections, hence CORRECT scales your data set, while XSCALE does not repeat this step - it only merges your data in the sense that it puts your data on a common scale. This is the application of a not too difficult mathematical formula (which is listed in the xds wiki, but I don't remember the URL). Regards, Tim On 11/11/2014 10:07 PM, Sudhir Babu Pothineni wrote: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Xscale XSCALE http://www.mpimf-heidelberg.mpg.de/%7Ekabsch/xds/html_doc/xscale_parameters.html is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only /needed/ if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is scaled again in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells. _Scaling with scala/aimless_ http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Scaling_with_SCALA_%28or_better:_aimless%29 -Sudhir *** Sudhir Babu Pothineni GM/CA @ APS 436D Argonne National Laboratory 9700 S Cass Ave Argonne IL 60439 Ph : 630 252 0672 On 11/11/14 14:42, wtempel wrote: Thank you Boaz. So if CORRECT can do a fully corrected scaling, are there no corrections that XSCALE might apply to XDS_ASCII.HKL data that are beyond CORRECT's capabilities? Wolfram On Tue, Nov 11, 2014 at 3:05 PM, Boaz Shaanan bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il wrote: Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz /Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 / // // / / *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [wtem...@gmail.com mailto:wtem...@gmail.com] *Sent:* Tuesday, November 11, 2014 9:50 PM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Subject:* [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1307L=CCP4BBH=1P=186901 on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel -- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A signature.asc Description: OpenPGP digital signature
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Hello Kay, you said the o-word, and you are familiar with the inner workings of XDS. Has the data-to-parameter ratio in even complex scaling models become so small that a doubling (worst case) of model parameters would be a serious concern? Could one detect such overfitting by, say, comparing (molecular) model R-factors between refinement against the once (CORRECT) scaled or twice (CORRECT+AIMLESS) scaled data? Thank you, Wolfram On Wed, Nov 12, 2014 at 10:32 AM, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output files. best, Kay On Wed, 12 Nov 2014 11:22:51 +0100, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Wolfram Tempel, there might be some confusion about terms. It is correct that xscale scales several data sets together. However, in crystallography, 'merging' might be the better term for this process. Crystallographic 'Scaling' is far more complicated than 'merging'. It applies correction factors which try to make up for experimental errors in your data set. These corrections include the sigma-values, which is particularly important for experimental phasing. In that respect it can actually hamper the data quality if you (crystallographically) scale your data twice, although the effect is rather subtle. CORRECT carries out these corrections, hence CORRECT scales your data set, while XSCALE does not repeat this step - it only merges your data in the sense that it puts your data on a common scale. This is the application of a not too difficult mathematical formula (which is listed in the xds wiki, but I don't remember the URL). Regards, Tim On 11/11/2014 10:07 PM, Sudhir Babu Pothineni wrote: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Xscale XSCALE http://www.mpimf-heidelberg.mpg.de/%7Ekabsch/xds/html_doc/xscale_parameters.html is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only /needed/ if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is scaled again in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells. _Scaling with scala/aimless_ http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Scaling_with_SCALA_%28or_better:_aimless%29 -Sudhir *** Sudhir Babu Pothineni GM/CA @ APS 436D Argonne National Laboratory 9700 S Cass Ave Argonne IL 60439 Ph : 630 252 0672 On 11/11/14 14:42, wtempel wrote: Thank you Boaz. So if CORRECT can do a fully corrected scaling, are there no corrections that XSCALE might apply to XDS_ASCII.HKL data that are beyond CORRECT's capabilities? Wolfram On Tue, Nov 11, 2014 at 3:05 PM, Boaz Shaanan bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il wrote: Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz /Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 / // // / / *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [wtem...@gmail.com mailto:wtem...@gmail.com] *Sent:* Tuesday, November 11, 2014 9:50 PM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Subject:* [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1307L=CCP4BBH=1P=186901 on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Hi Wolfram, it took me a while until I realized that you mean overfitting when you said o-word. You can abuse XDS in a number of ways, and I would call them overfitting the data although that would be using the word in a somewhat strained way: reducing WFAC1 below 1, decreasing REFLECTIONS/CORRECTION_FACTOR below 50 come to mind, but in an extended sense there are other ways: rejecting frames for no other reason than that they have low I/sigma or high Rmeas, ... People always seem to find ways to beautify their precision indicators, but they are just fooling themselves, because rejecting data just for cosmetic reasons creates bias. In other words, they trade random error against systematic error. Guess what is worse. A deeper reason of the problem is that crystallographers have been fixated on data R-factors for decades, and have become really spoilt by this. Our science has been completely mis-lead when it comes to data statistics, and is recovering only slowly. Concerning non-cautious use of SCALA/AIMLESS after CORRECT: actually I know of no systematic studies in this respect. But I know one thing: it is better to be critical with respect to recipes, than to follow them blindly. So I suggest the following project: compare SAD structure solution with the following routes a) INTEGRATE - CORRECT scaling - SHELXD b) INTEGRATE - AIMLESS scaling - SHELXD c) INTEGRATE - CORRECT+AIMLESS scaling - SHELXD d) INTEGRATE - CORRECT but scaling switched off - AIMLESS scaling - SHELXD e) INTEGRATE - CORRECT scaling - AIMLESS but scaling switched off - SHELXD and report here. You can add XSCALE into the mix but that won't change the picture, since it does the exact same calculations for multiple datasets as CORRECT does for single datasets. Personally, I don't understand why people would _want_ to do c),d) or e) because that's just added complexity, and additional sources of error. I'm looking forward to the results of such studies! Kay On Wed, 12 Nov 2014 12:41:28 -0500, wtempel wtem...@gmail.com wrote: Hello Kay, you said the o-word, and you are familiar with the inner workings of XDS. Has the data-to-parameter ratio in even complex scaling models become so small that a doubling (worst case) of model parameters would be a serious concern? Could one detect such overfitting by, say, comparing (molecular) model R-factors between refinement against the once (CORRECT) scaled or twice (CORRECT+AIMLESS) scaled data? Thank you, Wolfram On Wed, Nov 12, 2014 at 10:32 AM, Kay Diederichs kay.diederi...@uni-konstanz.de wrote: Hi Tim, this is incorrect. XSCALE determines the relative scale and B in a first step (this is what you describe). It then, in a second step, re-determines all scale factors (exactly as CORRECT does for the individual data sets), at the exact same supporting points that CORRECT used. (This avoids over-fitting which would result from a scaling model with different basis functions; a worry that I have when people use SCALA/AIMLESS after CORRECT without taking precautions.) The resulting scale factors are written to files MODPIX*.cbf, DECAY*.cbf, ABSORP*.cbf for inspection. Thirdly, it produces statistics and writes output files. best, Kay On Wed, 12 Nov 2014 11:22:51 +0100, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Wolfram Tempel, there might be some confusion about terms. It is correct that xscale scales several data sets together. However, in crystallography, 'merging' might be the better term for this process. Crystallographic 'Scaling' is far more complicated than 'merging'. It applies correction factors which try to make up for experimental errors in your data set. These corrections include the sigma-values, which is particularly important for experimental phasing. In that respect it can actually hamper the data quality if you (crystallographically) scale your data twice, although the effect is rather subtle. CORRECT carries out these corrections, hence CORRECT scales your data set, while XSCALE does not repeat this step - it only merges your data in the sense that it puts your data on a common scale. This is the application of a not too difficult mathematical formula (which is listed in the xds wiki, but I don't remember the URL). Regards, Tim On 11/11/2014 10:07 PM, Sudhir Babu Pothineni wrote: http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Xscale XSCALE http://www.mpimf-heidelberg.mpg.de/%7Ekabsch/xds/html_doc/xscale_parameters.html is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only /needed/ if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is scaled again in XSCALE, since the supporting points of the scalefactors
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il Phone: 972-8-647-2220Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [wtem...@gmail.com] Sent: Tuesday, November 11, 2014 9:50 PM To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
Thank you Boaz. So if CORRECT can do a fully corrected scaling, are there no corrections that XSCALE might apply to XDS_ASCII.HKL data that are beyond CORRECT's capabilities? Wolfram On Tue, Nov 11, 2014 at 3:05 PM, Boaz Shaanan bshaa...@bgu.ac.il wrote: Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz *Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710* -- *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [ wtem...@gmail.com] *Sent:* Tuesday, November 11, 2014 9:50 PM *To:* CCP4BB@JISCMAIL.AC.UK *Subject:* [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1307L=CCP4BBH=1P=186901 on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Xscale XSCALE http://www.mpimf-heidelberg.mpg.de/%7Ekabsch/xds/html_doc/xscale_parameters.html is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only /needed/ if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is scaled again in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells. _Scaling with scala/aimless_ http://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Scaling_with_SCALA_%28or_better:_aimless%29 -Sudhir *** Sudhir Babu Pothineni GM/CA @ APS 436D Argonne National Laboratory 9700 S Cass Ave Argonne IL 60439 Ph : 630 252 0672 On 11/11/14 14:42, wtempel wrote: Thank you Boaz. So if CORRECT can do a fully corrected scaling, are there no corrections that XSCALE might apply to XDS_ASCII.HKL data that are beyond CORRECT's capabilities? Wolfram On Tue, Nov 11, 2014 at 3:05 PM, Boaz Shaanan bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il wrote: Hi, I actually choose the option 'constant' further down in the aimless gui but I guess the effect is similar to 'onlymege'. Boaz /Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel E-mail: bshaa...@bgu.ac.il mailto:bshaa...@bgu.ac.il Phone: 972-8-647-2220 Skype: boaz.shaanan Fax: 972-8-647-2992 or 972-8-646-1710 / // // / / *From:* CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK] on behalf of wtempel [wtem...@gmail.com mailto:wtem...@gmail.com] *Sent:* Tuesday, November 11, 2014 9:50 PM *To:* CCP4BB@JISCMAIL.AC.UK mailto:CCP4BB@JISCMAIL.AC.UK *Subject:* [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS Hello all, in a discussion https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1307L=CCP4BBH=1P=186901 on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel
Re: [ccp4bb] To scale or not to scale: XDS_ASCII.HKL input to POINTLESS/AIMLESS
You can take XDS data into Pointless Aimless (the CCP4 Data Reduction task) either from the unscaled INTEGRATE.HKL or the scaled XDS_ASCII.HKL file (or files). In the case of a single XDS_ASCII.HKL you don't need to rescale it in Aimless, though you can if you want. Aimless uses a similar but not identical scaling model to XDS, which may be better or worse (and how do you judge?). Phil On 11 Nov 2014, at 19:50, wtempel wtem...@gmail.com wrote: Hello all, in a discussion on this board, Kay Diederichs questioned the effect of scaling data in AIMLESS after prior scaling in XDS (CORRECT). I understand that the available alternatives in this work flow are to specify the AIMLESS ‘onlymerge’ command, or not. Are there any arguments for the preference of one alternative over the other? Thank you for your insights, Wolfram Tempel