[ccp4bb] Does any software use non-TRIPOS sections in mol2 files

2012-04-18 Thread Thomas Womack
Is it safe to assume that the section headers in mol2 files are all

@TRIPOSsomething

or is all that's guaranteed the initial @?

http://tripos.com/data/support/mol2.pdf has everything TRIPOS, but that's 
defining the 'Tripos Mol2 File Format' and I don't know if someone else has 
defined a different class of records for their program's own use.

Tom

Re: [ccp4bb] Announcing a Web Server for the Grade ligand restraints generator.

2012-03-20 Thread Thomas Womack
On 20 Mar 2012, at 12:34, Eleanor Dodson wrote:

 I would like to use this to check an existing ligand. I have the PDB refined 
 according to a cif file, and that cif file used for input to REFMAC and 
 phenix.
 
 I dont want to lose the atom names assigned there so is it possible to start 
 GRADE with one of those inputs or do I have to convert it to a MOL2 file (I 
 guess thsat is a SYBIL file?)

You will have to convert it to a mol2 file; we have had very acceptable results 
doing this using openbabel, simply

obabel ligand.pdb -Oligand.mol2

at least in the case where the ligand has all its hydrogen atoms present and 
named.  If the ligand doesn't have hydrogen atoms, you will have to use 

obabel ligand_noH.pdb -h -Oligand_H.mol2

then edit ligand_H.mol2 so that all the hydrogen atoms have different names (I 
appreciate this is tedious, it will be automatic in the next version), then use 
that as input.

Ton

Re: [ccp4bb] sudden drop in R/Rfree

2012-03-02 Thread Thomas Womack

On 2 Mar 2012, at 16:02, Regina Kettering wrote:

 Rajesh;
 
 I am not sure that you have a high enough data:refinement parameters ratio to 
 refine TLS.  It just adds more parameters to refine that can lead to 
 over-refinement of your model, especially at the 3.3 A. 

TLS only adds twenty parameters per chain; so it's a really parsimonious thing 
to do at low resolution.

I'd say that adding lots of waters at 3.3A (at four parameters per added water) 
was much more likely to be the cause of a very wide R/Rfree gap.

I'm a bit worried that a user working at low resolution on a protein with more 
than one chain per ASU is not using NCS from the very beginning; that's another 
good way of adding more restraints and effectively getting the 
parametersto-data ratio down (because the 'parameters' in that ratio is really 
'parameters minus K * number of restraints'; there is scope for a lot of debate 
as to the right value of K, it clearly depends on the strength of the 
restraints)

If he's using the Global Phasing refinement software, I would strongly suggest 
that Rajesh use targetting to the initial molecular replacement result 
throughout the refinement, as yet a third way of adding more restraints.

Tom Womack (Global Phasing)

 
 HTH,
 Regina
 
 From: Rajesh kumar ccp4...@hotmail.com
 To: CCP4BB@JISCMAIL.AC.UK 
 Sent: Friday, March 2, 2012 10:54 AM
 Subject: [ccp4bb] sudden drop in R/Rfree
 
 
 Dear All, 
 
 I have a 3.3 A data for a protein whose SG is P6522. Model used was wild type 
 structure of same protein at 2.3 A.
  
 After molecular replacement, first three rounds of refinement the R/Rf was  
 26/32.8,  27.1/31.72 % and 7.35/30.88 % respectively.
 In the fourth round I refined with TLS and NCS abd added water and the R/Rf 
 dropped to 19.34/26.46. It has almost 7% difference. I also see lot of 
 unanswerable density in the map where lot of waters were placed. Model fits 
 to the map like a low resolution data with most of side chains don't have 
 best density.
 
 I was not expecting such a sudden drop in the R/Rfree and a difference is 
 7.2%. 
 I am wondering if I am in right direction. I am not sure if this usual for 
 3.3A data or in general any data if we consider the difference.
  I appreciate your valuable  suggestions.
 
 Thanks
 Raj
 
 
 
 



Re: [ccp4bb] Sub-angstrom resolution

2012-01-11 Thread Thomas Womack

On 11 Jan 2012, at 02:13, Artem Evdokimov wrote:

 There are two sides to this qustion: the scientific one is actually easier to 
 answer in generic terms - but I also would like to point out the very recent 
 example of a mystery that required very high resoluton (and orthogonal 
 techniques) to answer, namely the puzzle of the light atom in the center of 
 the mofe nitrogenase protein. Highly recommended reading.
 
That does sound interesting: could you give a reference?  I can find various 
papers about small slices of the puzzle, but not a review article.

Tom

Re: [ccp4bb] Sub-angstrom resolution

2012-01-11 Thread Thomas Womack
On 11 Jan 2012, at 11:36, Thomas Womack wrote:

 
 On 11 Jan 2012, at 02:13, Artem Evdokimov wrote:
 
 There are two sides to this qustion: the scientific one is actually easier 
 to answer in generic terms - but I also would like to point out the very 
 recent example of a mystery that required very high resoluton (and 
 orthogonal techniques) to answer, namely the puzzle of the light atom in the 
 center of the mofe nitrogenase protein. Highly recommended reading.
 
 That does sound interesting: could you give a reference?  I can find various 
 papers about small slices of the puzzle, but not a review article.

http://www.sciencemag.org/content/334/6058/974.full is the work using 
orthogonal techniques to figure out which the light atom actually was, with a 
discussion at
http://www.sciencemag.org/content/334/6058/914.full 

The high-resolution structure that revealed that there was a light atom there 
is from 2002: http://www.sciencemag.org/content/297/5587/1696.full with 
discussion at http://www.sciencemag.org/content/297/5587/1654.full

Tom

[ccp4bb] Making fixes to cif2mtz easier

2011-12-21 Thread Thomas Womack
The patch in 
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=CCP4BB;325e1870.1112 solves the 
problem with 3u57.

A few days ago I read the interesting paper 
http://journals.iucr.org/d/issues/2011/01/00/dz5216/dz5216bdy.html referring to 
a crystal structure containing a diselenide bond; I downloaded the model and 
structure factors for 2xsk, and cif2mtz refused to convert them.

This turns out to be because the Bijvoet pairs measured in the header were 
described as

 _refln.pdbx_F_meas_plus
 _refln.pdbx_F_meas_plus_sigma
 _refln.pdbx_F_meas_minus
 _refln.pdbx_F_meas_minus_sigma

while both the ccp4-6.2.0/lib/data/cif_mm.dic and the 
http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx_v40.dic dictionaries require 
these to be

 _refln.pdbx_F_plus
 _refln.pdbx_F_plus_sigma
 _refln.pdbx_F_minus
 _refln.pdbx_F_minus_sigma

It was trivial to fix the problem with a text editor (and this is a case where 
the right answer is to get wwpdb to fix the sf.cif file at their end), but this 
led to some discussion at Global Phasing as to what could be done to make 
cif2mtz handle sf.cif files with unusual data items without requiring modifying 
and recompiling the source code each time.

A summary of where we got to is at 

http://www.globalphasing.com/buster/wiki/index.cgi?CCP4cif2mtzImproveIdeas

I would appreciate any comments on how to proceed in this direction.

Tom

[ccp4bb] How to get cif2mtz to handle new fields

2011-09-23 Thread Thomas Womack
The current version of the mmcif_pdbx dictionary at 

http://mmcif.pdb.org/dictionaries/mmcif_pdbx.dic/Index/index.html

defines 

 _refln.pdbx_DELFWT
 _refln.pdbx_DELPHWT
 _refln.pdbx_FWT
 _refln.pdbx_PHWT

as fields in which you can deposit coefficients from the computation of 
weighted Fo-Fc and 2Fo-Fc maps; this is marvellous, since previously it's been 
very unclear how you deposit maps.

However, when I make an mmcif file with entries for these fields and pass it 
through the version of cif2mtz in ccp4-6.2.0, I get 

Line 77:data name _refln.pdbx_DELFWT not present in dictionary
Line 78:data name _refln.pdbx_FWT not present in dictionary
Line 79:data name _refln.pdbx_DELPHWT not present in dictionary
Line 80:data name _refln.pdbx_PHWT not present in dictionary

Is it possible to edit the dictionary?  It appears to be supplied as 
lib/cif_mmdic.lib which is a binary file; presumably that's produced with some 
kind of compiler from some kind of source file, but I'm not sure how to start 
looking for the compiler and the source.

Yours sincerely,

Thomas Womack (Global Phasing)

Re: [ccp4bb] Another paper structure retracted

2011-08-11 Thread Thomas Womack
On 11 Aug 2011, at 17:40, Diana Tomchick wrote:

 A quick glance at the header of the PDB file shows that there is one glaring 
 discrepancy between it and the table in the paper that hasn't been mentioned 
 yet in this forum. The data completeness (for data collection) reported in 
 the paper is 95.7%, but in the header of the PDB file (actually, in both the 
 2QNS and the 3KJ5 depositions) the data completeness (for data collection) is 
 reported as only 59.4%.

This is nastily anisotropic data; using the Sawaya diffraction anisotropy 
server lists principal components 23.2, -9.9, -13.3A^2; the resolution cut-off 
is roughly where the C*-axis goes to F/sigF=2 and there's a good deal of 
information left on the A* and B* axes.

There is also a large cone of missing data around the A* axis, and both missing 
and poorly-correlated reflections at low resolution - beamstop issues?

The peptide is arranged roughly parallel to the B axis.

It's not an irredeemably bad apo structure, there are a few peptide flips and I 
can rebuild quickly to R/Rfree 0.188/0.259 against the aniso-corrected data 
from the Sawaya server (first step in rebuilding was deleting the C chain, and 
it's not coming back).

Tom

Re: [ccp4bb] Off Topic: PDB validation server

2011-07-18 Thread Thomas Womack
On 8 Jul 2011, at 19:13, Katherine Sippel wrote:

 I know that the PDB updated its validation server in May as described in 
 their news link but it seemed to indicate an increase in output options 
 rather than a change in criteria. Is anyone aware of  what changes were made 
 to the validation server in regards to the preferred geometrical and 
 stereochemical features?

As far as I can tell empirically, if I run the validation server today it 
complains about

a) waters which make a perfectly good contact with a residue in a different ASU

b) waters which make a perfectly good contact with metal ions or with other 
waters which themselves make a perfectly good contact with the protein.

and this means it's really not much use for validation of large complicated 
proteins with hundreds of waters.

Tom

Re: [ccp4bb] Follow-up: non-waters among structured solvent atoms

2011-06-17 Thread Thomas Womack

On 16 Jun 2011, at 17:19, Pavel Afonine wrote:

 Hi,
 
 On Thu, Jun 16, 2011 at 7:49 AM, Jan Dohnalek dohnalek...@gmail.com wrote:
 
 Modeling more UNKNOWN atoms might be the future for these cases?
 
 one needs to specify chemical element type in 77-78 position, otherwise these 
 records are useless.

But if you know the chemical element type then there's no point in calling it 
UNK.

BUSTER uses the scattering factors for oxygen for modelling X, on the grounds 
that you'll have put in an X because it doesn't look enough unlike water to be 
obviously something else.  

Tom

[ccp4bb] A small bug in the CCP4 dictionary?

2011-01-20 Thread Thomas Womack
The restraint dictionary for hydrogenated tryptophan, 
lib/data/monomers/t/TRP.cif, lists a 15-atom plane for the sidechain, omitting 
the atom HZ2.

Unless this is an exciting result derived from neutron diffraction experiments, 
would it be possible to fix the dictionary?

Yours sincerely,

Thomas Womack (Global Phasing)

Re: [ccp4bb] Coot cannot read mtz or pdb files

2010-10-04 Thread Thomas Womack
On 4 Oct 2010, at 11:15, Leiman Petr wrote:

 Dear all,
 
 Coot behaves in a very strange way on my student's MacBook (32bit) running
 MacOS X 10.6.4. Both versions of coot are affected - the precompiled Prof.
 Scott's one and the compiled from source.
 
 It cannot read in MTZ files (quote: This is not an mtz file). PDB files
 are garbled up on reading as well. Most (but not all) connections are
 broken. A screenshot is attached.

I think this is a locale issue; try running 'LANG=C coot'.

I suspect the parser is assuming that the decimal-point character is 'comma' 
not 'full-stop', which is why the atoms have been moved to exact-integer 
locations.

Tom

Re: [ccp4bb] Deposition of riding H

2010-09-15 Thread Thomas Womack
On 15 Sep 2010, at 18:04, Ed Pozharski wrote:

 On Wed, 2010-09-15 at 07:57 -0700, Pavel Afonine wrote:
 if you refined your structure with H, then you should deposit it with 
 H
 
 sure.  But the structure is not *refined with hydrogens* when they are
 in predicted positions.  Following the same logic one could suggest that
 electron density should be deposited, since we can approximate it.

And I notice that a fair number of groups do deposit electron density - at 
least, they deposit PHIC and sometimes even HL coefficients in the sf.cif file. 
 HL coefficients in the sf.cif file can get badly corrupted in the deposition 
process, but they definitely show willing.

 I think it's useful to limit the information presented in a pdb-file to
 what was actually refined + specific instructions on how the refinement
 was done.

I suppose I come to this from a background where every deposition is a fresh 
new test-case for new refinement software; it's only lack of download bandwidth 
and CPU power that makes me not want to start from the images.

I like the idea that what you deposit is the output of a well-defined 
refinement; which means that you need to deposit the instructions for doing the 
refinement, and the model you used as input.  There's a perfectly good PDB 
protocol for multi-MODEL files.  Nobody does such depositions, I think the PDB 
would complain if you tried, and there's the problem of endless regression.

I would be very happy if every PDB deposition with 'METHOD: MOLECULAR 
REPLACEMENT' had an extra MODEL in it containing the input to the molrep tool, 
and some REMARK lines describing how molrep was used; I would not complain if 
this was made compulsory for depositions which nowadays say 'STARTING MODEL: 
NULL'.  26 of the 130 depositions with method MOLECULAR REPLACEMENT this week 
have starting model NULL, as well as seven depositions with method FOURIER 
SYNTHESIS and starting model NULL.

(why do MAD and SAD depositions still have a STARTING MODEL field?)

(while we're on the subject of riding hydrogens, I would invite people to 
admire the conformations of the hydrogens in such places as the C-alpha of 
residues A45 and A57 of deposition 2x5n - it's clearly a software bug rather 
than any mistake on the part of the authors, but nonetheless striking)

Tom

Re: [ccp4bb] Low-resolution structure refinement with Refmac

2010-08-27 Thread Thomas Womack
On 27 Aug 2010, at 10:55, Petr Kolenko wrote:

 Dear crystallographers,
 
 I have a structure at 3.3A resolution, 16 identical chains in AU, merohedral 
 twinning present. I started to refine using NCS restraints with chain A as a 
 reference chain. Current Rwork/Rfree is 21/25. There is almost nothing to 
 refine manually in whole structure now. But, refinement without NCS 
 restraints results in Rwork/Rfree of about 17/28. What should I do? Or is it 
 possible to deposit the structure refined using NCS restraints in final 
 refinement?

This seems like a really well-done NCS refinement; using the multi-fold NCS is 
allowing you to get what is an excellent Rfree and Rwork-Rfree gap for 3.3A 
data.  Definitely deposit the NCS-restrained version; refining without NCS 
restraints just increases the number of parameters by a factor sixteen and 
spends most of those on fitting noise.

1% Ramachandran outliers at 3.3A also seems entirely reasonable.

Tom


Re: [ccp4bb] Should I be worried about negative electron density?

2010-05-19 Thread Thomas Womack
On 19 May 2010, at 00:36, Paul Emsley wrote:

 Jay Pan wrote:
 Hello Everyone,
 
 I have a reasonably well fitted electron density map through molecular 
 replacement. However, there is always some red region left no matter how 
 hard I tried when the mtz file is loaded into Coot. Is this because my model 
 is still not good enough or it’s natural to most model fittings. In another 
 word, should I be worried about the red region? Thanks in advance.
 
 Turn up the contour level and make it go away - that's what I do :)
 
 3 or 3.5 sigma peaks are typical.  Metals, carboxyls and disulfides are often 
 associated with relatively strong negative density, some people try adjust 
 their model to compensate (and others not, of course). As a rule of thumb, if 
 you have 5 sigma peaks at the end of your refinement, that might be 
 worrying/interesting.

The median height of the tallest positive peak after autoBUSTER re-refinement 
for the PDB *depositions* during April this year is about 7.2, and of the 
tallest negative peak about -5.0.

25/50/75th quantiles: 

negative peaks -5.9 / -5.0 / -5.6
positive peaks 6.1 / 7.2 / 8.6

Tom Womack (Global Phasing)

Re: [ccp4bb] Distinguishing Between Na+ and H2O

2010-02-18 Thread Thomas Womack
The deposition 3fiy from the start of last year might be of interest:

FORMUL   2   NA199(NA 1+)   
FORMUL  20  HOH   *256(H2 O)  

It is annoying that the periodic table offers such a discrete range of sizes 
for 1+ ions; I hoped the lanthanide contraction would provide a heavy sodium 
substitute with lots of anomalous scattering, but (if I believe 
http://en.wikipedia.org/wiki/Ionic_radius) no ... Ag+ is the closest match in 
size (still 15% or so bigger) but silver(I) compounds are usually insoluble, 
La3+ is the same size as Na+, and LaCl3 nicely soluble, but obviously it 
coordinates very differently.

Tom

[ccp4bb] Haem-cysteine interactions

2010-01-14 Thread Thomas Womack
One of the features that Global Phasing's routine runs of deposited PDB 
structures often pick up is very close contacts between the SG of cysteine 
residues and the CAB and CAC atoms of the propenyl groups on HEM ligands, not 
described by LINK cards in the header of the deposited structure.  There is 
generally a CXXC motif in the protein which provides two cysteines to hold a 
haem in place.

Am I right that, in general, a haem bound to cysteines should be modelled as 
the molecular entity called by the PDB HEC, with the CAB and CAC atoms 
essentially tetrahedral, the CAB-CBB and CAC-CBC bond lengths the same as a 
carbon-carbon single bond, and a link from SG to CAB with angle and bond 
lengths around the SG as in methionine?

There are a number of high-resolution structures deposited recently containing 
haems near cysteines, and in most of them a re-refinement gives substantial 
positive difference density about the CM atoms; there is even occasionally a 
sign of some kind of longer tail coming out from the CMB position.  It seems to 
be purely positive density, rather than the dipole that tends to be diagnostic 
of anisotropy.

My current thought is that even a 1.4A structure of a haem-containing protein 
(I'm looking at an autoBUSTER re-refinement of the 3fo3 deposition from EMBL 
Hamburg) may well come from a crystal sufficiently well-ordered that we're 
seeing hydrogens - one of these structures has at least one isoleucine with 
green blobs at positions which would be reasonable for every hydrogen on the 
side-chain - but I would be interested to know other peoples' experience and 
interpretation.

Tom Womack (Global Phasing)

Re: [ccp4bb] Eleven plausible phasing elements remain unused

2009-04-02 Thread Thomas Womack
On Wed, 2009-04-01 at 14:33 -0700, Ethan Merritt wrote:
 On Wednesday 01 April 2009 07:21:16 Thomas Womack wrote:
  A perusal of the PDB reveals that the game of Periodic Table bingo still
  has eleven rounds to run:
  
  scandium, titanium, germanium, zirconium, niobium, neodymium,
  dysprosium, thulium, hafnium, bismuth and thorium remain absent from PDB
  entries.
 
 Does this imply that there is a PDB entry containing Radon?

I defined 'plausible' as a half-life greater than a billion years,
though I wouldn't have been totally amazed to see a plutonium or
technetium derivative.  Elements with half-lives 10^6 to 10^9 years for
the most stable isotope are Np, Tc, Cm, Pu; next shortest is 31kyears
for 231Pa.  The long-lived curium-247 and plutonium-244 isotopes are
neutron-heavy and inconvenient to produce.

The web-accessible subset of the ICSD features a technetium arsenide, a
plutonium boride, a sodium neptunate(VII) and an americium iodide.

Tom


[ccp4bb] Eleven plausible phasing elements remain unused

2009-04-01 Thread Thomas Womack
A perusal of the PDB reveals that the game of Periodic Table bingo still
has eleven rounds to run:

scandium, titanium, germanium, zirconium, niobium, neodymium,
dysprosium, thulium, hafnium, bismuth and thorium remain absent from PDB
entries.

OK, many of these are elements that would rather be refractory oxides or
jet-engine components than hexammines, and niobium chloride clusters
don't seem to be as water-stable as Ta6Br14, but why have neodymium,
dysprosium and thulium so consistently been left out there in the cold
rather than admitted to the warmish embrace of carboxyl groups?  There
must somewhere be a protein with a site that cries out for ThCl2(2+), an
unexpectedly water-stable cation.

Tom