Re: [ccp4bb] problem of conventions
Bernhard, Well, it *IS* broke. As they say it works for me, so either you're using a different set of programs from me, or you're using the same programs but in a different way. Perhaps you could be more specific as to which program(s) appear to be broken? If possible please post the logfile(s) on this forum, then someone might recognise the problem(s). Did you try reporting it to CCP4 (assuming of course we're talking about CCP4 programs)? You're the 2nd person in this thread to claim that the space-group handling for the alternate settings is broken, so it would be nice to get to the bottom of it! If you are running some type of process, as you implied in referring to LIMS, then there is a step in which you move from the crystal system and point group to the actual space group. So, at that point you identify P22121. The next clear step, automatically by software, is to convert to P21212, and move on. That doesn't take an enormous amount of code writing, and you have a clear trail on how you got there. I'm puzzled why I need a workaround for a bug that only you and possibly James have experienced: AFAIK no-one else has reported problems with this recently. Wouldn't it be make more sense to fix the bug(s)? - that way, everyone benefits and I don't need to do anything! Anyway, to respond to your suggestion: I've spent some time looking into this (so I hope you'll forgive the delay in replying!), and unfortunately it's not as simple as you think. I can see 3 main steps that would be required for a workaround: Step 1 (create new crystal form entry): First I would have to make a copy of the entry for the old crystal form in the PROTEINS table, giving it a new unique ID. Then I would perform the re-indexing/re-orientation operations on the reference free-R MTZ files and the PDB file for the refined structure, and change the filename entries in the row of PROTEINS table just created to point to them. This row also contains the parameters for MR, rigid-body refinement, TLS and binding site definitions but these won't need to be changed. The user interface would need to be modified to give users the option of implementing this change, since I know some (most?) users who won't be happy to do it! One problem I foresee is confusing the users with a multiplicity of unit cells, since we already work with potentially 2 different cells per crystal: first the 'canonical' unit cell for the crystal form from the reference MTZ file header; then there's the unit cell for the isomorphous crystal as found by the indexing software. Users understand that the indexing program won't necessarily choose the reference cell, particularly in the situation you indicate below where 2 cell lengths are almost equal. Now you want me to add a 3rd possibly different unit cell, i.e. that after a second run of re-indexing to the 'standard setting'; the users won't understand the need for this. Next comes a tricky bit: for tracking purposes I would somehow need to make a link from the new crystal form to the old one, my guess is with a self-referencing foreign key. All the database applications for doing searches reports would need to be modified to recognise this change. This doesn't look trivial to me! I would need to hand this task over to the database administrator programmers, since I'm not involved with administration of the database. Getting a clear trail doesn't happen automatically, it has to be programmed! I anticipate some searching questions from all the users and the db admin, such as why do we need to do this?, what bad things will happen if we don't? and why haven't we seen these bad things happening before?. I'm hoping that you will be able to provide convincing answers to these questions - because I can't! Step 2 (re-index historical data): Then I would need to copy each entry for the historical datasets that were previously added to the database for the old crystal form to the new crystal form (of course it's actually _same_ crystal form, but we're fooling the LIMS into treating it as though it were a new one). This is so that we can continue to track the data using the new crystal form ID. All datasets for a given crystal form must be indexed in the same way since the LIMS interface allows you to mix match PDB, MTZ MAP files for the crystal form without the need to do superpositions (of course superpositions can be done if needed, but then you lose the symmetry info). These 'historical' datasets are all the ones generated in the process of getting and optimising the crystal form, i.e. from all the different constructs made (typically ~ 30 +- 20), the purifications and crystallisation trials, optimising the cryobuffer DMSO concentration for soaking ligands, then the datasets used during the structure determination (MR/MAD/SAD etc). This may run to 100-150 datasets, but the actual number is immaterial since it's just as easy to write the database application for many as for one. So a
Re: [ccp4bb] problem of conventions
On Fri, Apr 1, 2011 at 5:30 AM, Santarsiero, Bernard D. b...@uic.edu wrote: Ian, I think it's amazing that we can program computers to resolve a b c but it would be a major undertaking to store the matrix transformations for 22121 to 21212 and reindex a cell to a standard setting. I think you misunderstood the point I was making. Multiply your one by the several hundred datasets we sometimes collect for the various clones and crystallisation conditions needed to optimise the crystal form for soaking - that's what I mean by 'major undertaking'. As I explained all the datasets collected for a given crystal form have to be indexed the same way (even if only for archival purposes) before we can store them in the database (otherwise we would end up in an awful muddle!). I don't have a batch script to filter all the relevant datasets from the database, re-index each one (that's the easy part!), and re-register them all as a new crystal form. Why should I? - no-one has given me a cogent reason to re-index them in the first place which would justify the resulting downtime of the project (OK call me lazy!). I hope you see that doing each one manually is a non-starter: the project would have to be locked during the period of the operation so no new datasets could be down- or uploaded (which would further cause the upstream pipeline to backup). Operations that appear trivial when you only have to do them once suddenly become big problems when they have to be performed on an industrial scale! I was also told that I was lazy to not reindex to the standard setting when I was a grad student. Now it takes less than a minute to enter a transformation and re-index. They told you wrong! The conventional cell is the convention (by definition!), and the standard setting doesn't always correspond to the conventional cell (though in most cases it does). There's a reason for the distinction between meanings of 'standard' and 'conventional' - the meanings are very precise and non-interchangeable. The orthorhombic rule of a b c makes sense in 222 or 212121, but when there is a standard setting of the 2-fold along the c-axis, then why not adopt that? As I explained, sometimes we don't know the true space group (in terms of assigning the screw axes) until further along the pipeline (e.g. after MR or refinement), or at least it's always safer to be non-committal beyond P222 - why commit oneself to an irrevocable decision before it's absolutely necessary? You don't need to know the exact space group just to screen crystals for diffracting power! Adopting the standard setting would in the particular case of SGs 5, 17 18 require later re-indexing I hope you see why for us that's a non-starter. I'm not a believer in conventions for their own sake - a convention is merely a default set of rules which you apply when you have no sound basis on which to make a choice - the convention makes what is effectively a totally arbitrary choice for you. Conventions do have the advantage that if other people follow them then they will make the same decisions as you. The moment I have sufficient justification (e.g. as I said isomorphism overrides convention) to break with convention then I would have no hesitation in doing so. The fact that the standard setting has a 2-fold along c is merely an arbitrary choice and doesn't seem to me to be a good enough reason to break with the unit-cell convention. -- Ian On Thu, March 31, 2011 5:48 pm, Ian Tickle wrote: On Thu, Mar 31, 2011 at 10:43 PM, James Holton jmhol...@lbl.gov wrote: I have the 2002 edition, and indeed it only contains space group numbers up to 230. The page numbers quoted by Ian contain space group numbers 17 and 18. You need to distinguish the 'IT space group number' which indeed goes up to 230 (i.e. the number of unique settings), from the 'CCP4 space group number' which, peculiar to CCP4 (which is why I called it 'CCP4-ese'), adds a multiple of 1000 to get a unique number for the alternate settings as used in the API. The page I mentioned show the diagrams for IT SG #18 P22121 (CCP4 #3018), P21221 (CCP4 #2018) and P21212 (CCP4 #18), so they certainly are all there! Although I am all for program authors building in support for the screwy orthorhombics (as I call them), I should admit that my fuddy-duddy strategy for dealing with them remains simply to use space groups 17 and 18, and permute the cell edges around with REINDEX to put the unique (screw or non-screw) axis on the c position. Re-indexing is not an option for us (indeed if there were no alternative, it would be a major undertaking), because the integrity of our LIMS database requires that all protein-ligand structures from the same target crystal form are indexed with the same (or nearly the same) cell and space group (and it makes life so much easier!). With space-groups such as P22121 it can happen (indeed it has happened) that it was not possible to define the
[ccp4bb] problem of conventions
Excuse my naive (perhaps ignorant) question: when was the abc rule/convention/standard/whatever introduced? None of the textbooks I came across mentions it as far as I could see (not that this is reason for or against this rule of course). Thanks, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 ; Fax: 646-1710 Skype: boaz.shaanan
Re: [ccp4bb] problem of conventions
Dear Boaz, I think you are the one who is finally asking the essential question. The classification we all know about, which goes back to the 19th century, is not into 230 space groups, but 230 space-group *types*, i.e. classes where every form of equivalencing (esp. by choice of setting) has been applied to the enumeration of the classes and the choice of a unique representative for each of them. This process of maximum reduction leaves very little room for the introducing conventions like a certain ordering of the lengths of cell parameters. This seems to me to be a major mess-up in the field - a sort of second-hand mathematics by (IUCr) committee which has remained so ill-understood as to generate all these confusions. The work on the derivation of the classes of 4-dimensional space groups explained the steps of this classification beautifully (arithmetic classes - extension by non-primitive translations - equivalencing under the action of the normaliser), the last step being the choice of a privileged setting *in termns of the group itself* in choosing the representative of each class. The extra convention abc leads to choosing that representative in a way that depends on the metric properties of the sample instead of once and for all (how about that for a brilliant step backward!). Software providers then have to de-standardise the set of 230 space group *types* (where each representative is uniquely defined once you give the space group (*type*) number) to accommodate all alternative choices of settings that might be randomly thrown at them by the metric properties of e.g. everyone's orthorhombic crystals. Mathematically, what one then needs to return to is the step before taking out the action of the normaliser, but this picture gets drowned in clerical disputes about low-level software issues. My own take on this (when I was writing symmetry-reduction routines for my NCS-averaging programs, along with space-group specific FFT routines in the dark ages) was: once you have a complete mathematical classification that is engraved in stone (i.e. in the old International Tables and in crystallographic software as we knew it), then stick to it and re-index back and forth to/from the unique representative listed under the IT number, as needed - don't try and extend group-theoretic Tables to re-introduce incidental metrical properties that had been so neatly factored out from the final symmetry picture. Otherwise you get a dog's dinner. So much for my 0.02 Euro. With best wishes, Gerard. -- On Fri, Apr 01, 2011 at 11:30:12AM +, Boaz Shaanan wrote: Excuse my naive (perhaps ignorant) question: when was the abc rule/convention/standard/whatever introduced? None of the textbooks I came across mentions it as far as I could see (not that this is reason for or against this rule of course). Thanks, Boaz Boaz Shaanan, Ph.D. Dept. of Life Sciences Ben-Gurion University of the Negev Beer-Sheva 84105 Israel Phone: 972-8-647-2220 ; Fax: 646-1710 Skype: boaz.shaanan -- === * * * Gerard Bricogne g...@globalphasing.com * * * * Global Phasing Ltd. * * Sheraton House, Castle Park Tel: +44-(0)1223-353033 * * Cambridge CB3 0AX, UK Fax: +44-(0)1223-366889 * * * ===
Re: [ccp4bb] problem of conventions
Dear Gerard, The theory's fine as long as the space group can be unambiguously determined from the diffraction pattern. However practice is frequently just like the ugly fact that destroys the beautiful theory, which means that a decision on the choice of unit cell may have to be made on the basis of incomplete or imperfect information (i.e. mis-identification of the systematic absences). The 'conservative' choice (particularly if it's not necessary to make a choice at that time!) is to choose the space group without screw axes (i.e. P222 for orthorhombic). Then if it turns out later that you were wrong it's easy to throw away the systematic absences and change the space group symbol. If you make any other choice and it turns out you were wrong you might find it hard sometime later to recover the reflections you threw away! This of course implies that the unit-cell choice automatically conforms to the IT convention; this convention is of course completely arbitrary but you have to make a choice and that one is as good as any. So at that point lets say this is the 1970s and you know it might be several years before your graduate student is able to collect the high-res data and do the model-building and refinement, so you publish the unit cell and tentative space group, and everyone starts making use of your data. Some years later the structure solution and refinement is completed and the space group can now be assigned unambiguously. The question is do you then revise your previous choice of unit cell risking the possibility of confusing everyone including yourself, just in order that the space-group setting complies with a completely arbitrary 'standard' (and the unit cell non-conventional), and requiring a re-index of your data (and permutation of the co-ordinate datasets). Or do you stick with the IT unit cell convention and leave it as it is? For me the choice is easy ('if it ain't broke then don't fix it!'). Cheers -- Ian On Fri, Apr 1, 2011 at 1:40 PM, Gerard Bricogne g...@globalphasing.com wrote: Dear Boaz, I think you are the one who is finally asking the essential question. The classification we all know about, which goes back to the 19th century, is not into 230 space groups, but 230 space-group *types*, i.e. classes where every form of equivalencing (esp. by choice of setting) has been applied to the enumeration of the classes and the choice of a unique representative for each of them. This process of maximum reduction leaves very little room for the introducing conventions like a certain ordering of the lengths of cell parameters. This seems to me to be a major mess-up in the field - a sort of second-hand mathematics by (IUCr) committee which has remained so ill-understood as to generate all these confusions. The work on the derivation of the classes of 4-dimensional space groups explained the steps of this classification beautifully (arithmetic classes - extension by non-primitive translations - equivalencing under the action of the normaliser), the last step being the choice of a privileged setting *in termns of the group itself* in choosing the representative of each class. The extra convention abc leads to choosing that representative in a way that depends on the metric properties of the sample instead of once and for all (how about that for a brilliant step backward!). Software providers then have to de-standardise the set of 230 space group *types* (where each representative is uniquely defined once you give the space group (*type*) number) to accommodate all alternative choices of settings that might be randomly thrown at them by the metric properties of e.g. everyone's orthorhombic crystals. Mathematically, what one then needs to return to is the step before taking out the action of the normaliser, but this picture gets drowned in clerical disputes about low-level software issues. My own take on this (when I was writing symmetry-reduction routines for my NCS-averaging programs, along with space-group specific FFT routines in the dark ages) was: once you have a complete mathematical classification that is engraved in stone (i.e. in the old International Tables and in crystallographic software as we knew it), then stick to it and re-index back and forth to/from the unique representative listed under the IT number, as needed - don't try and extend group-theoretic Tables to re-introduce incidental metrical properties that had been so neatly factored out from the final symmetry picture. Otherwise you get a dog's dinner. So much for my 0.02 Euro. With best wishes, Gerard. -- On Fri, Apr 01, 2011 at 11:30:12AM +, Boaz Shaanan wrote: Excuse my naive (perhaps ignorant) question: when was the abc rule/convention/standard/whatever introduced? None of the textbooks I came across mentions it as far as I could see (not that this is reason for or against this rule of course). Thanks,
Re: [ccp4bb] problem of conventions
Dear Ian, Well, it *IS* broke. If you are running some type of process, as you implied in referring to LIMS, then there is a step in which you move from the crystal system and point group to the actual space group. So, at that point you identify P22121. The next clear step, automatically by software, is to convert to P21212, and move on. That doesn't take an enormous amount of code writing, and you have a clear trail on how you got there. To be even more intrusive, what if you had cell parameters of 51.100, 51.101, and 51.102, and it's orthorhombic, P21212. For other co-crystals, soaks, mutants, etc., you might have both experimental errors and real differences in the unit cell, so you're telling me that you would process according to the a b c rule in P222 to average and scale, and then it might turn out to be P22121, P21221, or P21212 later on? When you wish to compare coordinates, then you have re-assign one coordinate data to match the other by using superposition, rather than taking on an earlier step of just using the conventional space group of P21212? Again, while I see use of the a b c rule when there isn't an overriding reason to assign it otherwise, as in P222 or P212121, there *is* a reason to stick to the convention of one standard setting. That's the rationale on using P21/n sometimes vs. P21/c, or I2 vs C2, to avoid a large beta angle, and adopt a non-standard setting. Finally, if you think it's fine to use P22121, then can I assume that you also allow the use of space group A2 and B2? Bernie Bernie On Fri, April 1, 2011 8:46 am, Ian Tickle wrote: Dear Gerard, The theory's fine as long as the space group can be unambiguously determined from the diffraction pattern. However practice is frequently just like the ugly fact that destroys the beautiful theory, which means that a decision on the choice of unit cell may have to be made on the basis of incomplete or imperfect information (i.e. mis-identification of the systematic absences). The 'conservative' choice (particularly if it's not necessary to make a choice at that time!) is to choose the space group without screw axes (i.e. P222 for orthorhombic). Then if it turns out later that you were wrong it's easy to throw away the systematic absences and change the space group symbol. If you make any other choice and it turns out you were wrong you might find it hard sometime later to recover the reflections you threw away! This of course implies that the unit-cell choice automatically conforms to the IT convention; this convention is of course completely arbitrary but you have to make a choice and that one is as good as any. So at that point lets say this is the 1970s and you know it might be several years before your graduate student is able to collect the high-res data and do the model-building and refinement, so you publish the unit cell and tentative space group, and everyone starts making use of your data. Some years later the structure solution and refinement is completed and the space group can now be assigned unambiguously. The question is do you then revise your previous choice of unit cell risking the possibility of confusing everyone including yourself, just in order that the space-group setting complies with a completely arbitrary 'standard' (and the unit cell non-conventional), and requiring a re-index of your data (and permutation of the co-ordinate datasets). Or do you stick with the IT unit cell convention and leave it as it is? For me the choice is easy ('if it ain't broke then don't fix it!'). Cheers -- Ian On Fri, Apr 1, 2011 at 1:40 PM, Gerard Bricogne g...@globalphasing.com wrote: Dear Boaz, Â Â I think you are the one who is finally asking the essential question. Â Â The classification we all know about, which goes back to the 19th century, is not into 230 space groups, but 230 space-group *types*, i.e. classes where every form of equivalencing (esp. by choice of setting) has been applied to the enumeration of the classes and the choice of a unique representative for each of them. This process of maximum reduction leaves very little room for the introducing conventions like a certain ordering of the lengths of cell parameters. This seems to me to be a major mess-up in the field - a sort of second-hand mathematics by (IUCr) committee which has remained so ill-understood as to generate all these confusions. The work on the derivation of the classes of 4-dimensional space groups explained the steps of this classification beautifully (arithmetic classes - extension by non-primitive translations - equivalencing under the action of the normaliser), the last step being the choice of a privileged setting *in termns of the group itself* in choosing the representative of each class. The extra convention abc leads to choosing that representative in a way that depends on the metric properties of the sample instead of once and for all
[ccp4bb] problem of conventions
Dear all, I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. So, basically, beyond just warning people who might encounter similar problems, I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? With best wishes, Anita Anita Lewit-Bentley Unité d'Immunologie Structurale CNRS URA 2185 Département de Biologie Structurale Chimie Institut Pasteur 25 rue du Dr. Roux 75724 Paris cedex 15 FRANCE Tel: 33- (0)1 45 68 88 95 FAX: 33-(0)1 40 61 30 74 email: ale...@pasteur.fr
Re: [ccp4bb] problem of conventions
If you are using CCP4, it can accomodate P22121. However, just reindex in CCP4 to the correct setting with P21212. Bernie Santarsiero On Thu, March 31, 2011 9:28 am, Anita Lewit-Bentley wrote: Dear all, I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. So, basically, beyond just warning people who might encounter similar problems, I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? With best wishes, Anita Anita Lewit-Bentley Unité d'Immunologie Structurale CNRS URA 2185 Département de Biologie Structurale Chimie Institut Pasteur 25 rue du Dr. Roux 75724 Paris cedex 15 FRANCE Tel: 33- (0)1 45 68 88 95 FAX: 33-(0)1 40 61 30 74 email: ale...@pasteur.fr
Re: [ccp4bb] problem of conventions
The IUCr standard is to make abc, see http://nvl.nist.gov/pub/nistpubs/jres/107/4/j74mig.pdf http://nvl.nist.gov/pub/nistpubs/jres/106/6/j66mig.pdf and P 2 21 21 is a perfectly valid space group Pointless will reindex within the same point group, or you can choose the abc convention (SETTING CELL_BASED) or the reference setting P 21 21 2 (SETTING SYMMETRY_BASED), so you have a choice Most programs are perfectly happy with space group P 2 21 21 (I'm not sure about [auto]Sharp) Phil On 31 Mar 2011, at 15:28, Anita Lewit-Bentley wrote: Dear all, I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. So, basically, beyond just warning people who might encounter similar problems, I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? With best wishes, Anita Anita Lewit-Bentley Unité d'Immunologie Structurale CNRS URA 2185 Département de Biologie Structurale Chimie Institut Pasteur 25 rue du Dr. Roux 75724 Paris cedex 15 FRANCE Tel: 33- (0)1 45 68 88 95 FAX: 33-(0)1 40 61 30 74 email: ale...@pasteur.fr
Re: [ccp4bb] problem of conventions
Dear Anita, I happen to have a very similar problem today. Does XDS use the desired setting if you provide it with the correct cell and space group during the IDXREF step? You can otherwise re-index in CORRECT. To comment on Phil: I fed the mtz-file from pointless into ctruncate (or maybe it was scala) which left the space group string (P2 21 21) but turned the space group number 18 into 3018 - this does screw up autosharp and maybe also other programs which use the space group number/ symbol and not the symmetry operators. Tim On Thu, Mar 31, 2011 at 04:28:18PM +0200, Anita Lewit-Bentley wrote: Dear all, I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. So, basically, beyond just warning people who might encounter similar problems, I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? With best wishes, Anita Anita Lewit-Bentley Unité d'Immunologie Structurale CNRS URA 2185 Département de Biologie Structurale Chimie Institut Pasteur 25 rue du Dr. Roux 75724 Paris cedex 15 FRANCE Tel: 33- (0)1 45 68 88 95 FAX: 33-(0)1 40 61 30 74 email: ale...@pasteur.fr -- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen phone: +49 (0)551 39 22149 GPG Key ID = A46BEE1A signature.asc Description: Digital signature
Re: [ccp4bb] problem of conventions
To comment on Phil: I fed the mtz-file from pointless into ctruncate (or maybe it was scala) which left the space group string (P2 21 21) but turned the space group number 18 into 3018 - this does screw up autosharp and maybe also other programs which use the space group number/ symbol and not the symmetry operators. If that's the case autosharp should be fixed so it recognises the correct space group! (in this P22121 or #3018 in CCP4-ese). It has been fixed in autoBuster so maybe you're using an old version of autosharp? -- Ian
Re: [ccp4bb] problem of conventions
I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? The International Tables / IUCr / NIST convention _is_ a=b=c for orthorhombic so no re-indexing is necessary or desirable. See IT vol. A 5th ed. (2002), table 9.3.4.1 (p. 758 in my edition) for all the conventional cells. The problem may be that some programs are not sticking to the agreed convention - but then the obvious solution is to fix the program (or use a different one). Is the problem that XDS is indexing it correctly as P22121 but calling it SG #18 (i.e. instead of the correct #3018). That would certainly confuse all CCP4 programs which generally tend to use the space-group number first if it's available. I'm not clear what you mean when you say P22121 doesn't exist? It's clearly shown in my edition of IT (p. 202). Maybe your lab needs to invest in the most recent edition of IT? Cheers -- Ian
Re: [ccp4bb] problem of conventions
Interesting. My IT, both volume I and volume A (1983) only have P21212 for space group #18. Do I have to purchase a new volume A every year to keep up with the new conventions? Cheers, Bernie On Thu, March 31, 2011 12:57 pm, Ian Tickle wrote: I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? The International Tables / IUCr / NIST convention _is_ a=b=c for orthorhombic so no re-indexing is necessary or desirable. See IT vol. A 5th ed. (2002), table 9.3.4.1 (p. 758 in my edition) for all the conventional cells. The problem may be that some programs are not sticking to the agreed convention - but then the obvious solution is to fix the program (or use a different one). Is the problem that XDS is indexing it correctly as P22121 but calling it SG #18 (i.e. instead of the correct #3018). That would certainly confuse all CCP4 programs which generally tend to use the space-group number first if it's available. I'm not clear what you mean when you say P22121 doesn't exist? It's clearly shown in my edition of IT (p. 202). Maybe your lab needs to invest in the most recent edition of IT? Cheers -- Ian
Re: [ccp4bb] problem of conventions
There are no 'new' conventions to keep up with: recent editions of the old volume 1 or new A do not disagree on the question of the unit cell conventions (except for minor details which don't affect the majority of the common space groups), where by recent I mean going back ~ 70 years. So it's certainly not the case that the conventions are changing every year (that would be silly!) - they have been defined exactly once in the last 100 years! I believe the unit cell conventions currently in use were actually first defined by the 1952 edition of International Tables, so both the 1969 edition (volume '1') and the 1983 edition (1st of volume 'A') will certainly describe them. I have only the 2002 edition (the 5th) so I can't tell you exactly where to find the relevant info in the older editions. The very first edition of IT (1935 I believe) did not define the unit cell conventions, only the space groups, so I wouldn't recommend that! The older editions did not include information on alternate space-group settings simply in order to save paper: the 1952 edition was published in the years following WW2 when there was a paper shortage, so this was an important consideration! Only one setting (the 'standard setting') of each space group, chosen arbitrarily, was described and the crystallographer was expected to permute it to get the desired setting. If you need to see all the alternate settings laid out explicitly then you need to get hold of a recent (e.g. the 5th printed or 1st online) edition; failing that you have to work them out yourself! I thought the alternate settings were first described (though possibly without the diagrams) in the 1st (1983) edition of volume A, but I'm relying on memory and could well be wrong. The setting was often chosen to be consistent with a pre-existing isomorphous structure (i.e. generally isomorphism overrides convention); if there was none either the setting was defined by the unit cell convention, or often it was simply easiest to use the standard setting. Of course not everyone followed the conventions: it was common to write programs that could handle only the standard settings (and it still is!). Wiser programmers allowed space-groups to be defined arbitrarily by the equivalent positions instead of the number or symbol, so then it was straightforward to select any desired alternate setting. Note that the convention describes the unit cells, from which the space-group symbols are then derived, not the other way around. The ratiionale behind this is simple: there was a time not so long ago (which I remember!) when data collection and structure solution for even routine structures was actually non-trivial (I'm not implying that it's always trivial even nowadays!). However it was possible relatively straightforwardly to obtain the unit cell from precession photos (i.e. the indexing). It used to be common practice to publish an initial communication giving the unit cell and possibly a tentative space group; this would be followed up (often several years later!) by structures determined to successively higher resolution as more data was collected. Of course it was not possible to be 100% certain of the space-group assignment from the precession photos (and for several space-groups there is of course no unique space-group determinable from the systematic absences alone); final space-group assignment often had to wait several years for the structure determination. Hence it made sense to define the setting from the unit cell, not the space group. I recommend the 2 papers from the US National Institute of Standards Technology (see Phil's posting) for more on this: the NIST conventions are the same as the IUCr ones, i.e. based on the unit cell (in fact Alan Mighell when he was at NIST wrote much of unit-cell convention material in IT). -- Ian On Thu, Mar 31, 2011 at 7:36 PM, Santarsiero, Bernard D. b...@uic.edu wrote: Interesting. My IT, both volume I and volume A (1983) only have P21212 for space group #18. Do I have to purchase a new volume A every year to keep up with the new conventions? Cheers, Bernie On Thu, March 31, 2011 12:57 pm, Ian Tickle wrote: I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a
Re: [ccp4bb] problem of conventions
I have the 2002 edition, and indeed it only contains space group numbers up to 230. The page numbers quoted by Ian contain space group numbers 17 and 18. Although I am all for program authors building in support for the screwy orthorhombics (as I call them), I should admit that my fuddy-duddy strategy for dealing with them remains simply to use space groups 17 and 18, and permute the cell edges around with REINDEX to put the unique (screw or non-screw) axis on the c position. I have yet to encounter a program that gets broken when presented with data that doesn't have abc, but there are many non-CCP4 programs out there that still don't seem to understand P22121, P21221, P2122 and P2212. This is not the only space group convention issue out there! The R3x vs H3x business continues to be annoying to this day! -James Holton MAD Scientist On Thu, Mar 31, 2011 at 11:36 AM, Santarsiero, Bernard D. b...@uic.edu wrote: Interesting. My IT, both volume I and volume A (1983) only have P21212 for space group #18. Do I have to purchase a new volume A every year to keep up with the new conventions? Cheers, Bernie On Thu, March 31, 2011 12:57 pm, Ian Tickle wrote: I would like to share my experiencde with a rather unexpected problem of indexing conventions. Perhaps I can save people some time I have a crystal in the more unusual P21212 space-group (No 18). Its unit cell lengths are bac (please note). I systematically use XDS for data integration, since so far it was able to handle even the most horrible-looking spots. Now XDS indexed my data in space-group 18, but with the axes order abc! It had, in fact, invented a space-group P22121, which does not exist. I did not realise this until I had spent a couple of weeks with beautiful peaks in rotation functions, but hopeless results in translation functions. It wasn't until I looked more closely into the definition of the screw axes that I realised the problem. POINTLESS does not allow a reindexing of reflexions within the same space-group, but fortunately REINDEX did the trick at the level of intensities, because I like to use SCALA for careful scaling of my data. I was wo,dering if XDS could perhaps reindex reflexions according to Int. Table conventions once the screw axes of a crystal system have been identified? The International Tables / IUCr / NIST convention _is_ a=b=c for orthorhombic so no re-indexing is necessary or desirable. See IT vol. A 5th ed. (2002), table 9.3.4.1 (p. 758 in my edition) for all the conventional cells. The problem may be that some programs are not sticking to the agreed convention - but then the obvious solution is to fix the program (or use a different one). Is the problem that XDS is indexing it correctly as P22121 but calling it SG #18 (i.e. instead of the correct #3018). That would certainly confuse all CCP4 programs which generally tend to use the space-group number first if it's available. I'm not clear what you mean when you say P22121 doesn't exist? It's clearly shown in my edition of IT (p. 202). Maybe your lab needs to invest in the most recent edition of IT? Cheers -- Ian
Re: [ccp4bb] problem of conventions
On Thu, Mar 31, 2011 at 10:43 PM, James Holton jmhol...@lbl.gov wrote: I have the 2002 edition, and indeed it only contains space group numbers up to 230. The page numbers quoted by Ian contain space group numbers 17 and 18. You need to distinguish the 'IT space group number' which indeed goes up to 230 (i.e. the number of unique settings), from the 'CCP4 space group number' which, peculiar to CCP4 (which is why I called it 'CCP4-ese'), adds a multiple of 1000 to get a unique number for the alternate settings as used in the API. The page I mentioned show the diagrams for IT SG #18 P22121 (CCP4 #3018), P21221 (CCP4 #2018) and P21212 (CCP4 #18), so they certainly are all there! Although I am all for program authors building in support for the screwy orthorhombics (as I call them), I should admit that my fuddy-duddy strategy for dealing with them remains simply to use space groups 17 and 18, and permute the cell edges around with REINDEX to put the unique (screw or non-screw) axis on the c position. Re-indexing is not an option for us (indeed if there were no alternative, it would be a major undertaking), because the integrity of our LIMS database requires that all protein-ligand structures from the same target crystal form are indexed with the same (or nearly the same) cell and space group (and it makes life so much easier!). With space-groups such as P22121 it can happen (indeed it has happened) that it was not possible to define the space group correctly at the processing stage due to ambiguous absences; indeed it was only after using the SGALternative ALL option in Phaser and refining each TF solution that we identified the space group correctly as P22121. Having learnt the lesson the hard way, we routinely use P222 for all processing of orthorhombics, which of course always gives the conventional abc setting, and only assign the space group well down the pipeline and only when we are 100% confident; by that time it's too late to re-index (indeed why on earth would we want to give ourselves all that trouble?). This is therefore totally analogous to the scenario of yesteryear that I described where it was common to see a 'unit cell' communication followed some years later by the structure paper (though we have compressed the gap somewhat!), and we base the setting on the unit cell convention for exactly the same reason. It's only if you're doing 1 structure at a time that you can afford the luxury of re-indexing - and also the pain: many times I've seen even experienced people getting their files mixed up and trying to refine with differently indexed MTZ PDB files (why is my R factor so high?)! My advice would be - _never_ re-index! -- Ian I have yet to encounter a program that gets broken when presented with data that doesn't have abc, but there are many non-CCP4 programs out there that still don't seem to understand P22121, P21221, P2122 and P2212. I find that surprising! Exactly which 'many' programs are those? You really should report them to CCP4 (or to me if it's one of mine) so they can be fixed! We've been using CCP4 programs as integral components of our processing pipeline (from data processing through to validation) for the last 10 years and I've never come across one that's broken in the way you describe (I've found many broken for other reasons and either fixed it myself or reported it - you should do the same!). Any program which uses csymlib with syminfo.lib can automatically handle all space groups defined in syminfo, which includes all the common alternates you mentioned (and others such as I2). The only program I'm aware of that's limited to the standard settings is sftools (because it has its own internal space group table - it would be nice to see it updated to use syminfo!). This is not the only space group convention issue out there! The R3x vs H3x business continues to be annoying to this day! Yeah to that! H centring was defined in IT long ago (look it up) and it has nothing to do with the R setting! -- Ian
Re: [ccp4bb] problem of conventions
Ian, I think it's amazing that we can program computers to resolve a b c but it would be a major undertaking to store the matrix transformations for 22121 to 21212 and reindex a cell to a standard setting. I was also told that I was lazy to not reindex to the standard setting when I was a grad student. Now it takes less than a minute to enter a transformation and re-index. The orthorhombic rule of a b c makes sense in 222 or 212121, but when there is a standard setting of the 2-fold along the c-axis, then why not adopt that? Often we chose a non-setting when there was a historical precedence, as in the comparison of one structure to another, e.g., P21/c with beta greater than 120deg vs. P21/n, etc. That is no more difficult with modern computing than dragging along three space groups for #18. There was a compactness to 230, and only 230 space groups. (I cheat, since I agree there is both the rhombohedral and hexagonal cell settings for R3bar.) Bernie On Thu, March 31, 2011 5:48 pm, Ian Tickle wrote: On Thu, Mar 31, 2011 at 10:43 PM, James Holton jmhol...@lbl.gov wrote: I have the 2002 edition, and indeed it only contains space group numbers up to 230. The page numbers quoted by Ian contain space group numbers 17 and 18. You need to distinguish the 'IT space group number' which indeed goes up to 230 (i.e. the number of unique settings), from the 'CCP4 space group number' which, peculiar to CCP4 (which is why I called it 'CCP4-ese'), adds a multiple of 1000 to get a unique number for the alternate settings as used in the API. The page I mentioned show the diagrams for IT SG #18 P22121 (CCP4 #3018), P21221 (CCP4 #2018) and P21212 (CCP4 #18), so they certainly are all there! Although I am all for program authors building in support for the screwy orthorhombics (as I call them), I should admit that my fuddy-duddy strategy for dealing with them remains simply to use space groups 17 and 18, and permute the cell edges around with REINDEX to put the unique (screw or non-screw) axis on the c position. Re-indexing is not an option for us (indeed if there were no alternative, it would be a major undertaking), because the integrity of our LIMS database requires that all protein-ligand structures from the same target crystal form are indexed with the same (or nearly the same) cell and space group (and it makes life so much easier!). With space-groups such as P22121 it can happen (indeed it has happened) that it was not possible to define the space group correctly at the processing stage due to ambiguous absences; indeed it was only after using the SGALternative ALL option in Phaser and refining each TF solution that we identified the space group correctly as P22121. Having learnt the lesson the hard way, we routinely use P222 for all processing of orthorhombics, which of course always gives the conventional abc setting, and only assign the space group well down the pipeline and only when we are 100% confident; by that time it's too late to re-index (indeed why on earth would we want to give ourselves all that trouble?). This is therefore totally analogous to the scenario of yesteryear that I described where it was common to see a 'unit cell' communication followed some years later by the structure paper (though we have compressed the gap somewhat!), and we base the setting on the unit cell convention for exactly the same reason. It's only if you're doing 1 structure at a time that you can afford the luxury of re-indexing - and also the pain: many times I've seen even experienced people getting their files mixed up and trying to refine with differently indexed MTZ PDB files (why is my R factor so high?)! My advice would be - _never_ re-index! -- Ian I have yet to encounter a program that gets broken when presented with data that doesn't have abc, but there are many non-CCP4 programs out there that still don't seem to understand P22121, P21221, P2122 and P2212. I find that surprising! Exactly which 'many' programs are those? You really should report them to CCP4 (or to me if it's one of mine) so they can be fixed! We've been using CCP4 programs as integral components of our processing pipeline (from data processing through to validation) for the last 10 years and I've never come across one that's broken in the way you describe (I've found many broken for other reasons and either fixed it myself or reported it - you should do the same!). Any program which uses csymlib with syminfo.lib can automatically handle all space groups defined in syminfo, which includes all the common alternates you mentioned (and others such as I2). The only program I'm aware of that's limited to the standard settings is sftools (because it has its own internal space group table - it would be nice to see it updated to use syminfo!). This is not the only space group convention issue out there! The R3x vs H3x business continues to be annoying to this day!