Re: [ccp4bb] How many is too many free reflections?

2015-06-05 Thread Gerard Bricogne
Dear Frank,

 I was going to reply to Ian's last comment last night, but got
distracted.

 This last paragraph of Ian's message does sound rather negative
if detached from the context of the previous one, which was about
non-isomorphism between fragment complexes and the apo being the rule
rather than the exception. Ian uses the Crick-Magdoff definition of
an acceptable level of non-isomorphism, which is quite a stringent one
because its refers to a level that would invalidate isomorphism for
experimental phasing purposes. A much greater level of non-isomorphism
can be tolerated when it comes to solving a target-fragment complex
starting from the apo structure, so the Crick-Magdoff criterion is not
relevant here.

 Furthermore I think that Ian identifies perhaps too readily the
effect of non-isomorphism in creating noise in the comparison of
intensities and its effect on invalidating the working vs. free status
of observations. I think, therefore, that Ian's claim that failing the
Crick-Magdoff criterion for isomorphism results in scrambling the
distinction between the working set and the free set is a very big
overstatement.

 You describe as bookkeeping faff the procedures that Ian and I
outlined to preserve the FreeR flags of the apo refinement, and ask
for a paper. These matters are probably not glamorous enough to find
their way into papers, and would best be discussed (or re-discussed)
in a specialised BB like this one. If the shift from the question How
many is too many to How the free set should be chosen that I tried
to bring about yesterday results in a general sharing of evidence that
otherwise gets set aside, I will be very happy. I would find it unwise
to dismiss this question by expecting that there would be a mountain
of published evidence if it was really important. 

 Let us go ahead, then: could everyone who has evidence (rather
than preconceptions) on this matter please come forward and share it?
Answering this question is very important, even if the conclusion is
that the faff is unimportant.


 With best wishes,
 
  Gerard.

--
On Thu, Jun 04, 2015 at 10:43:15PM +0100, Frank von Delft wrote:
 I'm afraid Gerard an Ian between them have left me a bit confused
 with conflicting statements:
 
 
 On 04/06/2015 15:29, Gerard Bricogne wrote:
 snip
 In order to guard the detection of putative bound fragments against the 
 evils of model bias, it is very important to ensure that the refinement of 
 each complex against data collected on it does not treat as free any 
 reflections that were part of the working set in the refinement of the apo 
 structure.
 snip
 
 On 04/06/2015 17:34, Ian Tickle wrote:
 snip
 So I suspect that most of our efforts in maintaining common free R
 flags are for nothing; however it saves arguments with referees
 when it comes to publication!
 snip
 
 
 I also remember conversations and even BB threads that made me
 conclude that it did NOT matter to have the same Rfree set for
 independent datasets (e.g. different crystals).  I confess I don't
 remember the arguments, only the relief at not having to bother with
 all the bookkeeping faff Gerard outlines and Ian describes.
 
 So:  could someone explain in detail why this matters (or why not),
 and is there a URL to the evidence (paper or anything else) in
 either direction?
 
 (As far as I remember, the argument went that identical free sets
 were unnecessary even for exactly isomorphous crystals.  Something
 like this:  model bias is not a big deal when the model has largely
 converged, and that's what you have for molecular substitution (as
 Jim Pflugrath calls it).  In addition, even a weakly binding
 fragment compounds produces intensity perturbations large enough to
 make model bias irrelevant.)
 
 phx


Re: [ccp4bb] How many is too many free reflections?

2015-06-05 Thread Gerard Bricogne
Dear Dusan,

 This is a nice paper and an interestingly different approach to
avoiding bias and/or quantifying errors - and indeed there are all
kinds of possibilities if you have a particular structure on which you
are prepared to spend unlimited time and resources.

 The specific context in which Graeme's initial question led me to
query instead who should set the FreeR flags, at what stage and on
what basis? was that of the data analysis linked to high-throughput
fragment screening, in which speed is of the essence at every step. 

 Creating FreeR flags afresh for each target-fragment complex
dataset without any reference to those used in the refinement of the
apo structure is by no means an irrecoverable error, but it will take
extra computing time to let the refinement of the complex adjust to a
new free set, starting from a model refined with the ignored one. It
is in order to avoid the need for that extra time, or for a recourse
to various debiasing methods, that the book-keeping faff described
yesterday has been introduced. Operating without it is perfectly
feasible, it is just likely to not be optimally direct.

 I will probably bow out here, before someone asks How many
[e-mails from me] is too many? :-) .


 With best wishes,
 
  Gerard.

--
On Fri, Jun 05, 2015 at 09:14:18AM +0200, dusan turk wrote:
 Graeme,
 one more suggestion. You can avoid all the recipes by use all data for WORK 
 set and 0 reflections for TEST set regardless of the amount of data by using 
 the FREE KICK ML target. For explanation see our recent paper Praznikar, J.  
 Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood 
 refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134. 
 
 Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML”
 
 best,
 dusan
 
  
 
  On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system 
  lists...@jiscmail.ac.uk wrote:
  
  Date:Thu, 4 Jun 2015 08:30:57 +
  From:Graeme Winter graeme.win...@gmail.com
  Subject: Re: How many is too many free reflections?
  
  Hi Folks,
  
  Many thanks for all of your comments - in keeping with the spirit of the BB
  I have digested the responses below. Interestingly I suspect that the
  responses to this question indicate the very wide range of resolution
  limits of the data people work with!
  
  Best wishes Graeme
  
  ===
  
  Proposal 1:
  
  10% reflections, max 2000
  
  Proposal 2: from wiki:
  
  http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set
  
  including Randy Read recipe:
  
  So here's the recipe I would use, for what it's worth:
   1 reflections:set aside 10%
1-2 reflections:  set aside 1000 reflections
2-4 reflections:  set aside 5%
  4 reflections:set aside 2000 reflections
  
  Proposal 3:
  
  5% maximum 2-5k
  
  Proposal 4:
  
  3% minimum 1000
  
  Proposal 5:
  
  5-10% of reflections, minimum 1000
  
  Proposal 6:
  
  50 reflections per bin in order to get reliable ML parameter
  estimation, ideally around 150 / bin.
  
  Proposal 7:
  
  If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be
  40k i.e. rather a lot. Referees question use of  5k reflections as test
  set.
  
  Comment 1 in response to this:
  
  Surely absolute # of test reflections is not relevant, percentage is.
  
  
  
  Approximate consensus (i.e. what I will look at doing in xia2) - probably
  follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy
  most of the criteria raised by everyone else.
  
  
  
  On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter graeme.win...@gmail.com
  wrote:
  
  Hi Folks
  
  Had a vague comment handed my way that xia2 assigns too many free
  reflections - I have a feeling that by default it makes a free set of 5%
  which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems
  excessive now.
  
  This was particularly in the case of high resolution data where you have a
  lot of reflections, so 5% could be several thousand which would be more
  than you need to just check Rfree seems OK.
  
  Since I really don't know what is the right # reflections to assign to a
  free set thought I would ask here - what do you think? Essentially I need
  to assign a minimum %age or minimum # - the lower of the two presumably?
  
  Any comments welcome!
  
  Thanks  best wishes Graeme
  
  
 
 Dr. Dusan Turk, Prof.
 Head of Structural Biology Group http://bio.ijs.si/sbl/ 
 Head of Centre for Protein  and Structure Production
 Centre of excellence for Integrated Approaches in Chemistry and Biology of 
 Proteins, Scientific Director
 http://www.cipkebip.org/
 Professor of Structural Biology at IPS Jozef Stefan
 e-mail: dusan.t...@ijs.si
 phone: +386 1 477 3857   Dept. of Biochem. Mol. Struct. Biol.
 fax:   +386 1 477 3984   Jozef Stefan Institute
  

Re: [ccp4bb] How many is too many free reflections?

2015-06-05 Thread dusan turk
Graeme,
one more suggestion. You can avoid all the recipes by use all data for WORK set 
and 0 reflections for TEST set regardless of the amount of data by using the 
FREE KICK ML target. For explanation see our recent paper Praznikar, J.  Turk, 
D. (2014) Free kick instead of cross-validation in maximum-likelihood 
refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134. 

Link to the paper you can find at “http://www-bmb.ijs.si/doc/references.HTML”

best,
dusan

 

 On Jun 5, 2015, at 1:03 AM, CCP4BB automatic digest system 
 lists...@jiscmail.ac.uk wrote:
 
 Date:Thu, 4 Jun 2015 08:30:57 +
 From:Graeme Winter graeme.win...@gmail.com
 Subject: Re: How many is too many free reflections?
 
 Hi Folks,
 
 Many thanks for all of your comments - in keeping with the spirit of the BB
 I have digested the responses below. Interestingly I suspect that the
 responses to this question indicate the very wide range of resolution
 limits of the data people work with!
 
 Best wishes Graeme
 
 ===
 
 Proposal 1:
 
 10% reflections, max 2000
 
 Proposal 2: from wiki:
 
 http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Test_set
 
 including Randy Read recipe:
 
 So here's the recipe I would use, for what it's worth:
  1 reflections:set aside 10%
   1-2 reflections:  set aside 1000 reflections
   2-4 reflections:  set aside 5%
 4 reflections:set aside 2000 reflections
 
 Proposal 3:
 
 5% maximum 2-5k
 
 Proposal 4:
 
 3% minimum 1000
 
 Proposal 5:
 
 5-10% of reflections, minimum 1000
 
 Proposal 6:
 
 50 reflections per bin in order to get reliable ML parameter
 estimation, ideally around 150 / bin.
 
 Proposal 7:
 
 If lots of reflections (i.e. 800K unique) around 1% selected - 5% would be
 40k i.e. rather a lot. Referees question use of  5k reflections as test
 set.
 
 Comment 1 in response to this:
 
 Surely absolute # of test reflections is not relevant, percentage is.
 
 
 
 Approximate consensus (i.e. what I will look at doing in xia2) - probably
 follow Randy Read recipe from ccp4wiki as this seems to (probably) satisfy
 most of the criteria raised by everyone else.
 
 
 
 On Tue, Jun 2, 2015 at 11:26 AM Graeme Winter graeme.win...@gmail.com
 wrote:
 
 Hi Folks
 
 Had a vague comment handed my way that xia2 assigns too many free
 reflections - I have a feeling that by default it makes a free set of 5%
 which was OK back in the day (like I/sig(I) = 2 was OK) but maybe seems
 excessive now.
 
 This was particularly in the case of high resolution data where you have a
 lot of reflections, so 5% could be several thousand which would be more
 than you need to just check Rfree seems OK.
 
 Since I really don't know what is the right # reflections to assign to a
 free set thought I would ask here - what do you think? Essentially I need
 to assign a minimum %age or minimum # - the lower of the two presumably?
 
 Any comments welcome!
 
 Thanks  best wishes Graeme
 
 

Dr. Dusan Turk, Prof.
Head of Structural Biology Group http://bio.ijs.si/sbl/ 
Head of Centre for Protein  and Structure Production
Centre of excellence for Integrated Approaches in Chemistry and Biology of 
Proteins, Scientific Director
http://www.cipkebip.org/
Professor of Structural Biology at IPS Jozef Stefan
e-mail: dusan.t...@ijs.si
phone: +386 1 477 3857   Dept. of Biochem. Mol. Struct. Biol.
fax:   +386 1 477 3984   Jozef Stefan Institute
Jamova 39, 1 000 Ljubljana,Slovenia
Skype: dusan.turk (voice over internet: www.skype.com


Re: [ccp4bb] New ligand 3-letter code

2015-06-05 Thread Eleanor Dodson
I use any 3 letter/number code that i want. If you read the corresponding
cif file into coot it is used in preference to any in the library. The PDB
deposition team will assign a code if it is a new ligand to the database.
Could you relay this to original poster?

Thanks

Jim Brannigan


On 5 June 2015 at 14:58, Eleanor Dodson eleanor.dod...@york.ac.uk wrote:

 OK - thank you.
 How are things?
 E


 -- Forwarded message --
 From: Jim Brannigan jim.branni...@york.ac.uk
 Date: 5 June 2015 at 14:39
 Subject: Re: New ligand 3-letter code
 To: Eleanor Dodson eleanor.dod...@york.ac.uk


 Hi Eleanor

 I use any 3 letter/number code that i want. If you read the corresponding
 cif file into coot it is used in preference to any in the library. The PDB
 deposition team will assign a code if it is a new ligand to the database.
 Could you relay this to original poster?

 Thanks

 Jim Brannigan

 On 5 June 2015 at 11:28, Eleanor Dodson eleanor.dod...@york.ac.uk wrote:

 I use your method - trial  error..
 It would be nice if at least there was a list somewhere of unassigned
 codes!


 On 5 June 2015 at 09:16, Lau Sze Yi (SIgN) 
 lau_sze...@immunol.a-star.edu.sg wrote:

 Hi,

 What is the proper way of generating 3-letter code for a new ligand? As
 of now, I insert my ligand in Coot using smiles string and for the 3-letter
 code I picked a non-existent code by trial and error (not very efficient).
 A cif file with corresponding name which I generated using Phenix was
 imported into Coot.

 I am sure there is a proper way of doing this. Appreciate your feedback.

 Regards,
 Sze Yi







Re: [ccp4bb] PyMOL v. Coot map 'level'

2015-06-05 Thread Thomas Holder
Hi Emilia and Steven,

(re-posting after accidentally replying to the coot mailing list)

After off-list discussion with Steven, I updated:
http://pymolwiki.org/index.php/Normalize_ccp4_maps

If the goal is to match the display in Coot, this is what I would do:

# load map into PyMOL but don't normalize
set normalize_ccp4_maps, off
load yourmap.ccp4
load yourpdb.pdb

# create a mesh which matches Coot's level = 0.3462e/A^3 ( 1.00rmsd)
isomesh mesh, yourmap, 0.3462, (yourpdb)

PyMOL extends the map based on the symmetry information from the selection in 
the 4th argument. No need to create an extended map with MAPMASK as long as 
yourpdb.pdb has symmetry information. Same is true if the map came from an 
MTZ file.

I also updated http://pymolwiki.org/index.php/Display_CCP4_Maps and changed 
cover 'all atoms in PDB file' to cover 'asymmetric unit'. That way PyMOL's 
normalization should be identical to Coot's.

Regarding the question What does PyMOL's 1.0 mean in electrons/A^3?: After 
normalization (with normalize_ccp4_maps=on) PyMOL doesn't know about the 
original values anymore. I assume Coot takes the original values from the file 
as e/A^3, so if you don't normalize in PyMOL, you'll get e/A^3.

Hope that helps.

Cheers,
 Thomas

On 05 Jun 2015, at 01:36, Emilia C. Arturo (Emily) ec...@drexel.edu wrote:

 Thomas,
  
 I tried to figure out the PyMOL vs. Coot normalization discrepancy a while 
 ago. As far as I remember, PyMOL normalizes on the raw data array, while Coot 
 normalizes across the unit cell. So if the data doesn't exactly cover the 
 cell, the results might be different.
 
 I posted the same question to the Coot mailing list (the thread can be found 
 here: https://goo.gl/YjVtTu) , and got the following reply from Paul Emsley; 
 I highlight the questions that I think you could best answer, with '***':
 
 [ ...]
 I suspect that the issue is related to different answers to the rmsd of 
 what?
 
 In Coot, we use all the grid points in the asymmetric unit - other programs 
 make a selection of grid points around the protein (and therefore have less 
 solvent).
 
 More solvent means lower rmsd. If one then contours in n-rmsd levels, then 
 absolute level used in Coot will be lower - and thus seem to be noisier 
 (perhaps).  I suppose that if you want comparable levels from the same 
 map/mtz file then you should use absolute levels, not rmsd. ***What does 
 PyMOL's 1.0 mean in electrons/A^3?***
 
 Regards,
 
 Paul.
 
 Regards,
 Emily.
 
 
 On 01 Jun 2015, at 11:37, Emilia C. Arturo (Emily) ec...@drexel.edu wrote:
 One cannot understand what is going on without knowing how this map
  was calculated.  Maps calculated by the Electron Density Server have
  density in units of electron/A^3 if I recall, or at least its best
  effort to do so.
 
  This is what I was looking for! (i.e. what the units are) Thanks. :-)
  Yes, I'd downloaded the 2mFo-DFc map from the EDS, and got the same Coot v. 
  PyMOL discrepancy whether or not I turned off the PyMOL map normalization 
  feature.
 
 If you load the same map into Pymol and ask it to normalize the
  density values you should set your contour level to Coot's rmsd level.
   If you don't normalize you should use Coot's e/A^3 level.  It is
  quite possible that they could differ by a factor of two.
 
  This was exactly the case. The map e/A^3 level (not the rmsd level) in Coot 
  matched very well, visually, the map 'level' in PyMOL; they were roughly 
  off by a factor of 2.
 
  I did end up also generating a 2mFo-DFc map using phenix, which fetched the 
  structure factors of the model in which I was interested. The result was 
  the same (i.e. PyMOL 'level' = Coot e/A^3 level ~ = 1/2 Coot's rmsd level) 
  whether I used the CCP4 map downloaded from the EDS, or generated from the 
  structure factors with phenix.
 
  Thanks All.
 
  Emily.
 
 
 
  Dale Tronrud
 
  On 5/29/2015 1:15 PM, Emilia C. Arturo (Emily) wrote:
   Hello. I am struggling with an old question--old because I've found
   several discussions and wiki bits on this topic, e.g. on the PyMOL
   mailing list
   (http://sourceforge.net/p/pymol/mailman/message/26496806/ and
   http://www.pymolwiki.org/index.php/Display_CCP4_Maps), but the
   suggestions about how to fix the problem are not working for me,
   and I cannot figure out why. Perhaps someone here can help:
  
   I'd like to display (for beauty's sake) a selection of a model with
   the map about this selection. I've fetched the model from the PDB,
   downloaded its 2mFo-DFc CCP4 map, loaded both the map and model
   into both PyMOL (student version) and Coot (0.8.2-pre EL (revision
   5592)), and decided that I would use PyMOL to make the figure. I
   notice, though, that the map 'level' in PyMOL is not equivalent to
   the rmsd level in Coot, even when I set normalization off in PyMOL.
   I expected that a 1.0 rmsd level in Coot would look identical to a
   1.0 level in PyMOL, but it does not; rather, a 1.0 rmsd level in
   

[ccp4bb] CSHL X-ray Methods in Structural Biology Course Oct 12-27, 2015: Application deadline June 15th

2015-06-05 Thread Jim Pflugrath
The June 15th deadline for applications to the CSHL X-ray Methods in
Structural Biology Course to be held later this year, October 12 through
October 27, 2015 is rapidly approaching.

The official course announcement is here:
https://meetings.cshl.edu/courses.aspx?course=C-CRYSyear=15
so please pass this on to folks who might be interested and who would
benefit.

I think people will agree that this course is an outstanding place to learn
both the theoretical and practical aspects of Macromolecular
Crystallography because of the extensive lectures from world-renowned
teachers and the hands-on experiments.

This year's course will see the return of the long-time instruction team of
Alex McPherson, Gary Gilliland, Bill Furey and myself along with many
talented experts (see the course flyer linked above for more name dropping)
to help us give the participants an experience in Macromolecular
Crystallography learning that cannot be found anywhere else.  (The
student:teacher ratio ends up to be about 1:1).  We expect to have the
participants crystallize several proteins and determine their structures
all in about two weeks.  They will also become well-versed in the theory of
X-diffraction and crystal structure determination while having lots of fun,
but not much sleep.

The course is limited to 16 participants due to the very hands-on nature of
the experiments and the intimate seminar room and laboratory settings.
Please check the above web link for more details.  In particular, please
note the information about fellowships, scholarships, and stipends that are
available.

This course is supported with funds provided by the National Institute of
General Medical Sciences for which we are grateful.

If anyone has any questions, please send me e-mail, I will be happy to
answer all queries.

Thanks, Jim