from:"Ethan Merritt"

Re: [ccp4bb] request for help about a 2-crystal drawing tool...

2019-08-25 Thread Ethan Merritt (UW)

On Sunday, 25 August 2019 18:28:35 Laurent Maveyraud wrote:
> Dear CCP4ers,
> 
> a long time ago (something like 10 years, or even more), I used an applet 
> available on the internet for drawing small schematic 2D crystals (see 
> below). You had the possibility to draw a small motif (in my case something 
> like a  histidine side-chain) and then define the lattice you wanted to 
> generate. I used it a lots for teaching purposes… but I would now like to 
> refresh my images and generate new examples… Of course, I was not smart 
> enough to take note of the applet name or address… and google was not able to 
> help my memory

I have a copy here:

http://skuld.bmsc.washington.edu/people/merritt/bc530/local_copies/escher/

The problem is, it dates back to the good old days before everyone
got so worried about insecure java applets that java was disabled in
all the standard browsers.  So chances are you cannot run it in your
browser, you would have to download the code and run it locally.

cheers,

Ethan


> 
> 
> Does this reming something to anybody ? Any suggestions for software able to 
> generate such lattices are also welcome!
> 
> thanks a lot !
> Laurent
> 
> 
> Laurent Maveyraud
> PICT, Plateforme Intégrée de Criblage de Toulouse
> Université Paul Sabatier / CNRS / I.P.B.S. UMR 5089
> Département Biologie Structurale et Biophysique
> http://cribligand.ipbs.fr, http://www.ipbs.fr
> 205 route de Narbonne 31077 TOULOUSE Cedex FRANCE
> Tél: +33 (0)561 175 435  Mob.: +33 (0)646 042 111
> ---
> 
> 
> 1st French Congress on Integrative Structural Biology: please check 
> http://bsi-2019.ipbs.fr 
> 
> 
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Ligand identification

2019-07-18 Thread Ethan Merritt (UW)

On Thursday, July 18, 2019 1:02:25 PM PDT Nicola Evans wrote:
> > I have an unidentfied blob of density at a crystal contact region. I
> > tried inputting a magnesium ion there, it was clearly incorrect but it
> > improved the R-factors by 3, so I would really like to identify the
> > correct molecule! Is there a tool to identify ligands in structures? I
> > have tried the Phenix ligand identification tool to no avail (although
> > it did find some nice horseshoe shaped PEG ions). I used glycerol and
> > PEG400 as a cryo protectant, and identified a few glycerol and PEG
> > molecules in other parts of the structure but not in this spot (well
> > conditions: 0.2 M Magnesium chloride hexahydrate, 0.1 M BIS-TRIS pH
> > 5.5, 25% PEG 3,350). As this is an the crystal contact region the
> > molecule isn't likely to be biologically significant. The data have
> > been solved to 1.9Å. I have attached a screenshot of the mysterious
> > density. I would appreciate any suggestions! In addition, I am adding
> > water molecules to this protein, and often there are what appear to be
> > long chains. Are these likely to be long chains of ordered waters, or
> > more PEG molecules? Thanks in advance for your help! Nicola


You might enjoy reading the recent paper by Kowiel et al (2018)
"Automatic recognition of ligands in electron density by machine learning"
https://doi.org/10.1093/bioinformatics/bty626

One of the more convincing examples they give (Fig 6D) is identification of a 
TRIS
molecule.  By coincidence the density in the this figure looks a lot like yours.

cheers,

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Normalization of B-factors

2018-08-09 Thread Ethan Merritt

On Thursday, 09 August 2018 10:45:07 Pavel Afonine wrote:
> > I (personally) think the best answer from these was to look at the
> > TLS-subtracted residuals (ie. total B-factor - TLS component) — can’t
> > remember who sent it, off the top of my head.
> >
> 
> TLS is just an approximation, sometimes good and sometimes not. If TLS
> parameters are refined along with individual ADPs ("residual") the latter
> tend to compensate for eventual inadequacy of TLS model.
> 
> Pavel

Depending on the quality and resolution of your data, a different summary
may apply

  If individual ADPs are refined along with a TLS model they may add
  noise that obscures the significance of bulk motion and may 
  artifactually reduce R-factors through over-fitting. 

The best method of comparing structures ultimately depends on what
question you are trying to answer.

If you are trying to document a reduction in overall flexibility due to
ligand binding, the TLS descriptions (rather than individual B factors)
may be the most informative thing to compare.  If you are looking
for specific residues that become "locked down" or "disengaged"
upon ligand binding, then the opposite is true: you would want to
compare the residual B-factors in those residues after subtracting
out the TLS contribution.

At sufficiently high resolution you should also look for evidence
that some residues may gain or lose alternate conformations upon
ligand binding.  High B factors may indicate that an alternate
conformation has been missed, or its occupancy may have changed.

Ethan

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Python3 and MTZ

2018-06-07 Thread Ethan Merritt

On Thursday, 07 June 2018 16:55:55 Markus Gerstel wrote:
> On 6 June 2018 at 20:28, Ethan Merritt  wrote:
> 
> > On Wednesday, 06 June 2018 18:54:32 Robbie Joosten wrote:
> > > Right you are Kay. It would be very weird to start developing things on
> > Python 2.7 right now. Its days are numbered: https://pythonclock.org/
> >
> > I would take a contrarian view.
> > Given the instability of python development, the promise to leave version
> > 2.7
> > alone makes it more desirable than the current ever-changing version.
> > You can be reasonably sure that anything you write for 2.7 will continue
> > to work, since they won't change the 2.7 infrastructure underneath you.
> >
> > But in truth I would recommend staying away from python for new projects
> > altogether, precisely because it is continually unstable.  The python
> > development philosophy places low priority on backwards-compatibility.
> > Combined with the explicit philosophy that python should only support one
> > way of accomplishing any given task, that is a recipe for frequent and
> > continual breakage.
> >
> > Here's an essay from a few years back that I think is still apposite.
> > https://jakevdp.github.io/blog/2013/01/03/will-scientists-ever-move-to-
> > python-3/
> 
> 
> Your point of view may be valid in that Python 2 -> 3 breaks existing code.
> However you sound like you mean 3.3 -> 3.4 -> 3.5 -> 3.6 -> 3.7 would be an
> issue, and I think that is a rarely argued view.
> I certainly can't find anything in the essay backing this up.

My basic concern is that python language development has broken existing
code in the past. Frequently.  They feel free to change things at any point,
even fundamental things like the meaning of the arithmetic operation "divide".  
You may be perfectly happy with the current state of the python 3 language,
but based on past history that state will change and change again and sooner
or later your current code will stop working.  My concern is not the
state of the language at any given point in time, but the fact that it
has been unstable for the past two decades and the worry that this instability
will continue into the future.

I gather that one cluster of recent breakage is incompatible changes to
text handling.  I cannot speak to this issue from personal experience,
but here is a pointer to recent complaints/arguments/counter-arguments.
  https://lwn.net/Articles/741176/

> 
> For what it's worth, in my opinion:
> If you are new to Python - learn 3.
> If you are using Python2/3 compatible libaries - learn 3 and use the
> libraries from there.
> If you are using libraries that are not yet Python3 compatible - well then
> you have to use Python2 and please nag the developers to make it Python3
> compatible.

I kind of agree with all of those points.
The sticky point is if you are likely to be one of the people on the
receiving end of that nagging.

I.e. the advice to a new user may be different from the advice to
a developer embarking on a new project.

Ethan

> On 7 June 2018 at 11:43, Marcin Wojdyr  wrote:
> 
> > In other words, it's learning both Python2 and Python3 and using the
> > subset of the language that works with both interpreters.
> >
> 
> You only have to care about making your code 2/3 compatible if you are
> writing a library that someone else will import, ie. if you publish on pypi
> or elsewhere.
> Otherwise - as a newcomer - definitely do not bother with Python 2 and go
> straight to Python 3 only.
> 
> -Markus

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Python3 and MTZ

2018-06-06 Thread Ethan Merritt

On Wednesday, 06 June 2018 18:54:32 Robbie Joosten wrote:
> Right you are Kay. It would be very weird to start developing things on 
> Python 2.7 right now. Its days are numbered: https://pythonclock.org/

I would take a contrarian view.
Given the instability of python development, the promise to leave version 2.7
alone makes it more desirable than the current ever-changing version.
You can be reasonably sure that anything you write for 2.7 will continue
to work, since they won't change the 2.7 infrastructure underneath you.

But in truth I would recommend staying away from python for new projects
altogether, precisely because it is continually unstable.  The python
development philosophy places low priority on backwards-compatibility.
Combined with the explicit philosophy that python should only support one
way of accomplishing any given task, that is a recipe for frequent and
continual breakage.  

Here's an essay from a few years back that I think is still apposite.
https://jakevdp.github.io/blog/2013/01/03/will-scientists-ever-move-to-python-3/


Ethan


> 
> 
> Cheers,
> 
> Robbie
> 
> 
> 
> Sent from my Windows 10 phone
> 
> 
> 
> 
> From: CCP4 bulletin board  on behalf of Kay Diederichs 
> 
> Sent: Wednesday, June 6, 2018 8:47:07 PM
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Python3 and MTZ
> 
> Dear Nicolas,
> 
> my (our) motivation is purely that when learning Python today, and developing 
> something from scratch, Python3 appears like the better choice (compared to 
> version 2) - provided that basic crystallographic libraries can be used.
> 
> Just a note (for those whose operating system provides only one of the two 
> Python flavours): RHEL7 has Python2 as system library, but Python3 can be 
> installed in parallel (using "Software Collections"). The user makes a choice 
> by setting the PATH variable.
> 
> best,
> 
> Kay
> 
> On Wed, 6 Jun 2018 15:43:16 +0200, Nicolas FOOS  wrote:
> 
> >Dear Kay,
> >
> >depending of the motivation to develop in python3 (could be due to an OS
> >using python3 by default or you really prefer to work with python3). If
> >it's due to the OS, a possible strategy is to use virtualenv
> >(https://virtualenv.pypa.io/en/stable/) which let you use python2 even
> >if python3 is the default version for the OS. It exist probably other
> >method to have a contain installation of python2 with all the library needs.
> >
> >I used this strategy (virtualenv) to install ccp4 (with the installer
> >which needed python2) on a manjaro linux (Arch based) running python3
> >and that works very well.
> >
> >Nicolas
> >
> >Nicolas Foos
> >PhD
> >Structural Biology Group
> >European Synchrotron Radiation Facility (E.S.R.F)
> >71, avenue des Martyrs
> >CS 40220
> >38043 GRENOBLE Cedex 9
> >+33 (0)6 76 88 14 87
> >+33 (0)4 76 88 45 19
> >
> >On 06/06/2018 14:25, Kay Diederichs wrote:
> >> Dear all,
> >>
> >> I haven't tried to read MTZ files from Python until now, but for a new
> >> project in my lab I'd like to do that - and with Python3.
> >>
> >> Googling around, it seems that iotbx from cctbx is not (yet)
> >> Python3-compatible.
> >>
> >> So, what are my options?
> >>
> >> thanks,
> >>
> >> Kay
> >
> >
> >
> >To unsubscribe from the CCP4BB list, click the following link:
> >https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB=1

Re: [ccp4bb] Electron density

2018-05-13 Thread Ethan Merritt

On Monday, 14 May 2018 10:39:32 Daniel Garcia wrote:
> Dear all,
> 
> I am currently refining a structure and found a intriguing electron density
> at the protein surface (pictures attached, the Fo-Fc map is contoured at
> >3.5 sigma). My first candidates were molecules from my protein prep or
> crystallisation buffer, but none of them seem to fit well. I can observe
> that the ligand is nearby the side chains of a tyrosine, a lysine, a
> threonine and a glutamate residue, and it is close to the carbonyl oxygens
> of the protein backbone of a nearby loop. The shape of this density is not
> pyramidal, but it is not planar either.
> 
> Do you have any suggestions to solve this density based on your own
> experience? My crystallisation buffer contains tartrate, ammonium sulphate,
> and CHES, and my protein is in Tris buffer containing DTT and sodium
> chloride.

DMSO?


> 
> Best regards,
> 
> 

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] B-factor standardization

2018-04-05 Thread Ethan Merritt

On Thursday, 05 April 2018 15:49:44 Oliviero Carugo wrote:
> Dears,
> 
> everybody knows that B-factors may change amongst different crystal 
> structures and that they need to be standardized when different protein 
> crystal structures are compared.
> 
> If I am not wrong, I remember that someone proposed to standardize 
> B-factors of protein atoms as “BS = B - Bave”, where Bave is the average 
> B-factor of the protein. Such standardization is based on the hypothesis 
> that independent sources of disorder add in determining the final 
> B-factor. BS should represent atomic B-factors depurated by all factors 
> different from atom oscillation, since Bave differences are neutralized.

That sounds like flawed logic to me.
If you were going to do this at all, I think you would have to 
start by comparing the residual B factors after removing TLS
contributions.

> Does anyone can help me in finding a publication (80s or 90s) where 
> these BS values are used?

I think there are better tools available now than there were 
25 years ago.  I wouldn't go back that far to choose one without
first considering the work and thought that has been put into
structure analysis in the meantime.  If nothing else, you might
consider that "Bave" as calculated from the "Biso" (really Beq)
column in typical PDB files is a less than perfect approximation
[Acta Cryst. 2012 A67:512-516]

Ethan

> Thanks!
> 
> Oliviero

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] PDB redo and biased/unbiased R-free

2018-02-06 Thread Ethan Merritt

On Tuesday, 06 February 2018 17:25:13 Nadia Leloup wrote:
> Dear all,
> 
> I was looking at a 3 angstrom structure from 2015 with relatively bad
> statistics, so I decided to look at the pdb-redo of said structure.
> Surprisingly, pdb-redo statistics are even worse. 

You say "even worse", but the subset of statistics you show does not
look so bad for a 3A structure either before or after re-refinement.
Other than maybe the Rfree itself, which is what you ask about.

> As you can see on the
> attached picture, the pdb-redo Rfree comes with a caveat:
> 
> R-free was considered biased, the estimated unbiased R-free was used
> 
> I understand that the R-free was considered biased because a new (Rfree)
> test set was determined. However, I'm not sure what is the unbiased R-free
> is / how it is calculated in this case?

>From the "How does it work" page on the PDB Redo web site,
my understanding is that rather than calculating Rfree from the
re-refinement, various other quality measures are used to 
calculate an expected Rfree/R ratio consistent with other 
structures of similar quality.
This ratio is then multipled by R to yield an estimated Rfree.

Robbie Joosten will probably correct me if I have that wrong :-)

cheers,

Ethan

> 
> Thanks in advance,
> 
> Best,
> 
> Nadia

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-09 Thread Ethan Merritt

On Friday, 10 November 2017 05:29:09 Keller, Jacob wrote:
> >>62500 is < 40^3, so ±20 indices on each axis.
> 50Å / 20 = 2.5Å,  so not quite 2.5Å resolution
> 
> Nice--thanks for calculating that. Couldn't remember how to do it off-hand, 
> and I guess my over-estimate comes from most protein crystals having some 
> symmetry. I don't really think it affects the question though--do you?
> 
> >>All that proves is that assigning each 1x1x1 voxel a separate density value 
> >>is a very inefficient use of information.  Adjacent voxels are not 
> >>independent, and no possible assignment of values will get around the 
> >>inherent blockiness of the representation.
> 
> Not sure what this means--what is the precise definition or measure of 
> "efficient use of information?" Like a compression algorithm? 

If it helps you to think of it that way, fine.
Suppose it is possible to compress a data set losslessly.
The information content is unchanged, but the compressed representation
is smaller than the original, so the information content per unit of size
is higher - a better use of space - hence "more efficient".

> Are diffraction data sets like compressed data?

Not the diffraction data, no.

But it is true that a truncated Fourier series is one way of compressing data.
Because of the truncation. it is a lossy, rather than lossless, compression.
An infinite series could give infinite resolution, but a truncated series is 
limited by the resolution of terms that are kept after truncation.

For example the compression used in JPEG is a truncated discrete cosine
transform (DCT), making JPEG files smaller than the original pixel-by-pixel 
image.

I'll throw a brain-teaser back at you.

As just noted, encoding the continuous electron density distribution in a
unit cell as a truncated Fourier series is essentially creating a JPEG image of
the original.  It is lossy, but as we know from experience JPEG images are 
pretty good at retaining the "feel" of the origin even with fairly severe
truncation.

But newer compression algorithms like JPEG2000 don't use DCTs,
instead they use wavelets.   I won't get sidetracked by trying to describe
wavelets, but the point is that by switching from a series of cosines to
a series of wavelets you can get higher compression.  They are
more efficient in representing the original data at a selected resolution.   

So here's the brain-teaser:
Why does Nature use Fourier transforms rather than Wavelet transforms?
Or does she?
Have we crystallographers been fooled into describing our experiments
in terms of Fourier transforms when we could do better by using wavelets
or some other transform entirely?

Ethan

> Also, the "blockiness" of representation is totally ancillary--you can do all 
> of the smoothing you want, I think, and the voxel map will still be basically 
> lousy. No?

> >>I know!  Let's instead of assigning a magnitude per voxel, let's assign a 
> >>magnitude per something-resolution-sensitive, like a sin wave.   Then for 
> >>each hkl measurement we get one sin wave term.   Add up all the sine waves 
> >>and what do you get?  Ta da.  A nice map.
> 
> It was good of proto-crystallographers to invent diffraction as a way to 
> apply Fourier Series. I don't know--it seems funny to me that somehow 
> diffraction is able to harness "efficient information use," whereas the voxel 
> map is not. I am looking for more insight into this.
> 
> >>Aren't Fourier series marvelous?
> 
> Well, I have always liked FTs, but your explanations are not particularly 
> enlightening to me yet.
> 
> I will re-iterate that the reason I brought this up is that the imaging world 
> might learn a lot from crystallography's incredible extraction of all 
> possible information through the use of priors and modelling.
> 
> Also, I hope you noticed that all of the parameters about the 
> crystallographic data set were extremely optimistic, and in reality the 
> information content would be far less.
> 
> One could compare the information content of the derived structure to that of 
> the measurements to get a metric for "information extraction," perhaps, and 
> this could be applied across many types of experiments in different fields. I 
> nominate crystallography for the best ratio.
> 
> JPK
> 
> 
> 
>  
> > Assuming that it is apt, however: is this a possible way to see the power 
> > of all of our Bayesian modelling? Could one use our modelling tools on such 
> > a grainy picture and arrive at similar results?
> >
> > Are our data sets really this poor in information, and we just model the 
> > heck out of them, as perhaps evidenced by our scarily low data:parameters 
> > ratios?
> > 
> > My underlying motivation in this thought experiment is to illustrate the 
> > richness in information (and poorness of modelling) that one achieves in 
> > fluorescence microscopic imaging. If crystallography is any measure of the 
> > power of modelling, one could really go to town on some of these terabyte

Re: [ccp4bb] Basic Crystallography/Imaging Conundrum

2017-11-09 Thread Ethan Merritt

On Friday, 10 November 2017 00:10:22 Keller, Jacob wrote:
> Dear Crystallographers,
> 
> I have been considering a thought-experiment of sorts for a while, and wonder 
> what you will think about it:
> 
> Consider a diffraction data set which contains 62,500 unique reflections from 
> a 50 x 50 x 50 Angstrom unit cell, with each intensity measured perfectly 
> with 16-bit depth. (I am not sure what resolution this corresponds to, but it 
> would be quite high even in p1, I think--probably beyond 1.0 Angstrom?).

Meh. 
62500 is < 40^3, so ±20 indices on each axis.
50Å / 20 = 2.5Å,  so not quite 2.5Å resolution


> Thus, there are 62,500 x 16 bits (125 KB) of information in this alone, and 
> there is an HKL index associated with each intensity, so that I suppose 
> contains information as well. One could throw in phases at 16-bit as well, 
> and get a total of 250 KB for this dataset.
> 
> Now consider an parallel (equivalent?) data set, but this time instead of 
> reflection intensities you have a real space voxel map of the same 50 x 50 x 
> 50 unit cell consisting of 125,000 voxels, each of which has a 16-bit 
> electron density value, and an associated xyz index analogous to the hkl 
> above. That makes a total of 250 KB, with each voxel a 1 Angstrom cube. It 
> seems to me this level of graininess would be really hard to interpret, 
> especially for a static picture of a protein structure. (see attached: top is 
> a ~1 Ang/pixel down-sampled version of the image below).

All that proves is that assigning each 1x1x1 voxel a separate density value is 
a very
inefficient use of information.  Adjacent voxels are not independent, and no 
possible
assignment of values will get around the inherent blockiness of the 
representation.

I know!  Let's instead of assigning a magnitude per voxel, let's assign a 
magnitude per
something-resolution-sensitive, like a sin wave.   Then for each hkl 
measurement we get
one sin wave term.   Add up all the sine waves and what do you get?  Ta da.  A 
nice map.
 
> Or, if we wanted smaller voxels still, let's say by half, we would have to 
> reduce the bit depth to 2 bits. But this would still only yield half-Angstrom 
> voxels, each with only four possible electron density values.
> 
> Is this comparison apt? Off the cuff, I cannot see how a 50 x 50 pixel image 
> corresponds at all to the way our maps look, especially at around 1 Ang 
> resolution. Please, if you can shoot down the analogy, do.

Aren't Fourier series marvelous?

 
> Assuming that it is apt, however: is this a possible way to see the power of 
> all of our Bayesian modelling? Could one use our modelling tools on such a 
> grainy picture and arrive at similar results?
>
> Are our data sets really this poor in information, and we just model the heck 
> out of them, as perhaps evidenced by our scarily low data:parameters ratios?
> 
> My underlying motivation in this thought experiment is to illustrate the 
> richness in information (and poorness of modelling) that one achieves in 
> fluorescence microscopic imaging. If crystallography is any measure of the 
> power of modelling, one could really go to town on some of these terabyte 5D 
> functional data sets we see around here at Janelia (and on YouTube).
> 
> What do you think?
> 
> Jacob Keller
> 
> +
> Jacob Pearson Keller
> Research Scientist / Looger Lab
> HHMI Janelia Research Campus
> 19700 Helix Dr, Ashburn, VA 20147
> (571)209-4000 x3159
> +
> 

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] A challenging Molecular replacement

2017-07-17 Thread Ethan Merritt

On Tuesday, 18 July 2017 00:01:59 CDaddy wrote:
> I am a structural biologist who is teaching X-ray crystallography. Recently I 
> noticed that BrlR structure (5XQL) was solved using molecular replacement 
> with a search model of very low similarity. I am very interested in this 
> structure because I think this a very good example to show students how to 
> solve phase problem using molecular replacement, especially when the model 
> and the target protein share a low sequence identity. However, when I 
> downloaded the data from PDB, I found that I cannot solve the phase problem 
> using Phaser as mentioned by the authors. During this procedure BmrR 
> (PDB:1R8E) was used as the search model. I tried to consult the authors for 
> help but receive no response by now. Since the description of this issue in 
> the literature is very brief, could anyone please spend a little time on this 
> molecular replacement and give me some advices on this issue? I like to learn 
> some valuable tricks. Your assistance will be highly appreciated.


I have no familiarity with either structure, but even a cursory glance at the
cartoon depiction of 5XQL in the PDB suggests that you would want to chop
it into at least 3 pieces in order to use it for molecular replacement.
Did you try placing the N- and C- terminal domains separately after
chopping out the long connecting helix?

Ethan

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

[ccp4bb] TLSMD server status

2017-01-28 Thread Ethan Merritt

The TLSMD server has been bandaged and splinted and is
back on its feet, if hobbling a bit.
This is a short-term fix.  Replacement hardware expected
sometime next week.  Go ahead and submit your jobs, but
don't be too surprised if the server has a relapse.

Thanks to everyone who offered to discuss mirroring the
server elsewhere. I will follow up on that once things
stabilize. 

Ethan

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] CCP4BB Digest - 24 Jan 2017 to 25 Jan 2017 (#2017-26)

2017-01-26 Thread Ethan Merritt

On Thursday, 26 January 2017 11:03:12 PM Claire Smith wrote:
> Hello,
> 
> I would like to ask a question related to a recent thread regarding
> bad/missing density.
> 
> I have used SAD to solve the structure of a protein at about 2.6Å
> resolution. Phenix build a good portion of it (about 50%) and the density
> in this region is good. However, we cannot see ther rest 50%.  Rfree is
> currently at 32%.  No twinning is suggested.
> 
> How can we "find" the missing 50%? We have tried MR-SAD with no significant
> improvement. We know the missing mass is there because we ran the crystals
> on a  gel, and the protein is intact.
> 
> I know with MR sometimes the space group can make a difference, but with
> experimental phasing (SAD),  if the space group were incorrectly
> identified, would we have gotten a solution to the Se substructure? This
> seems correct because the visible 50% of the protein make total sense. So,
> since we have a correct substructure, can I conclude with confidence that
> the space group is correctly identified?

>From your description it sounds unlikely that the unit cell is correct but
the space group is wrong.   However the symptoms do match a possible
missed supercell, such that you collected and refined data from only 
a subset of the true unique data.

I will try to describe this in words but a picture would be so much easier...
I will try ascii art.  Please view in a fixed spacing font.
Suppose the true cell looks like this:
---
| AAA|  AAA  BBB  |
| B  |  BBBB  |
| AAA   BBB  |  AAA   |
---

I.e. suppose you have pseudo-translational NCS such that domain A
superimposes perfectly on itself but domain B does not.
If you incorrectly index that cell edge as being only 1/2 its true length,
you measure only half of the true data but it refines nicely to 
describe a fully-occupied A and a mess in the region where B should be.
(superimposed 1/2 intensity ghosts made noisier by the missing data).

Yes I've hit this in real life, with an even messier case of cell-edge
tripling rather than doubling.
If you want to pursue this possibility, you should go back to the
diffraction images and look really hard for weak spots in between the
indexed spots.

good luck,

Ethan

 
> Of course, it could be that the missing bit is very flexible and does not
> scatter coherently, but then, wouldn't we expect a lower Rfree?

An Rfree of 0.32 for 2.6A refinement doesn't sound that bad.

 
> Thanks so much!
> 
> Claire
> 

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] Calculation of generalised R-factor?

2016-12-20 Thread Ethan Merritt

On Tuesday, 20 December 2016 10:28:44 PM Pavel Afonine wrote:
> Hi Dirk,
> 
> 
> I want to check the validity of the refinement of anisotropic B-factors vs.
> > TLS + isototropic B-factors using the Hamilton R-value ratio test as
> > described in Ethan Merritt's paper "To B or not to B", Acta Cryst. D, Vol
> > 68, pp 468. This test uses the generalised R-factors (assuming unit
> > weights), RG=(Sum(Fo-Fc)^2/Sum(Fo)^2)^1/2. Although Hamilton wrote that
> > at the end of refinement, one could also use the similar ratio of the usual
> > R-factors, I really would like to check the ratio of the RG-values after
> > refinement. As far as I can see, this value is not reported by the usual
> > refinement programs.
> 
> 
> 
> R factor is a global metric that, if considered alone, is not going to
> answer your question. Best is to consider all three:
> 
> 1) Rfree;
> 2) Rfree-Rwork;

> 3) Meaningfulness of refined TLS matrices. Note, as we discovered and
> documented recently, results of TLS refinements (TLS matrices) are
> nonsensical in 85% of PDB entries (yes, eighty-five are bad, believe it or 
> not!):

> From deep TLS validation to ensembles of atomic models built from elemental
> motions. A. Urzhumtsev, P. V. Afonine, A. H. Van Benschoten, J. S. Fraser and 
> P. D.
> Adams. Acta Cryst. (2015). D71, 1668-1683.

As you know, I disagree on this point.

The Urzhumtsev et al classification of "nonsensical" TLS matrices includes
many that make lots of sense but do not happen to describe a perfectly rigid 
body.
That's OK, because proteins are not perfectly rigid bodies.
The TLS models are useful approximations that capture 
essential features of a messy ensemble of protein atoms. 
Complaining that in practice the refined TLS values deviate from those that
would hypothetically be obtained from fitting perfectly rigid groups is beside
the point.

Of course some refinements really are bad and some models really are
unreasonable.  Validation tests can help you catch these and fix your
model or refinement.  But a validation criterion that is so strict that
it labels 85% of all protein refinements as "nonsensical" is not a very
useful test. 

> 
> I'd say if you pass "1-3)" you are more than good. If still in doubt, you
> can make an extra effort and do what's described in
> 
> Validation of crystallographic models containing TLS or other descriptions
> of anisotropy
> F. Zucker, P. C. Champ and E. A. Merritt
> Acta Cryst. (2010). D66, 889-900
> 
> which may reveal extra troubles.

Note that the primary validation test described in the Zucker paper
(we called it SKITTLS) is a check for the pairwise consistency of 
adjacent TLS groups.   It might flag as inconsistent two adjacent
groups that both pass the criteria in Urzhumtsev et al, or conversely
it might rate two groups that fail the Urzhumtsev criteria as being
nevertheless consistent in their description of atoms they jointly
apply to.

Ethan

> All the best,
> Pavel

-- 
Ethan A Merritt, Dept of Biochemistry
Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] local coordinate uncertainty / SFCHECK questions

2016-11-21 Thread Ethan Merritt

On Monday, 21 November 2016 11:53:22 PM Seok-Yong Lee wrote:
> Hi All,
> 
> We have recently solved several structures of a membrane protein in slightly 
> different conformations at low resolutions (3.9-4.2 A). We would like to see 
> that these structures reflect truly different conformations and to what 
> extent these structures are discernible. To answer this question, we would 
> need to estimate local and global coordinate uncertainty of these structures 
> to see if the local/global conformational differences of the structures are 
> bigger than the local/global coordinate uncertainty.
> 
> I am wondering if there is any good way/program to show local coordinate 
> uncertainty. I found that using SFCHECK I can get (i) amplitude of 
> displacement of atom from electron density and (ii) correlation coefficient 
> per residue. I have following questions.
> 
>   1)Can we use this amplitude of displacement as local 
> coordinate uncertainly? If not, is there a way to use as this displacement 
> amplitude to get an estimate of the local coordinate error?
> 
>   2)The output regarding this plot in SFCHECK is somewhat difficult 
> to understand, as it shows a bar graph with multiples of sigma per residue. 
> What does it mean by those residues with no sigma? Do these residues have too 
> much or too less errors?
> 
>   3)Is there a way to convert the output as a text file so that I can 
> plot it myself?
> 
>   4)Any recommendation with other programs that can produce local 
> coordinate uncertainty per residue?
> 
>   Any advice would be greatly appreciated.  Thank you in advance. 

I don't think that the uncertainty/error in individual coordinates is a good 
way to
describe or quantify the existence of "truly different conformations", unless 
what
you really meant is "different rotamers" or some other very local property.

Nevertheless, if you want to estimate coordinate error from the available 
refined
quantities, I suggest looking at
Cruickshank, D. W. J. (1999). Acta Cryst. D55, 583–601.
This paper describes an empirical estimate DPI that seems close to what you
are asking for.  A further empirical simplification was suggest by David Blow
Acta Cryst. (2002). D58, 792-797 
https://doi.org/10.1107/S0907444902003931

If the intent is to ask whether a conformation resulting from final refinement 
against
data from one crystal is compatible with the data from a different crystal, I 
suspect
the most powerful and convincing test is to simply try it.  Transfer the 
relevant
piece of the model from crystal A into your working model for crystal B, refine 
a bit
(maybe only B factors or TLS), and see whether the resulting R/Rfree value are 
different
from those you previously had for the crystal B model.   You could then try the
complementary excercise by placing the original model from crystal B into the
context of the current model and data from crystal A.

Ethan Merritt

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] SUMMARY: Equation Editor woes with Office 2011 for Mac

2015-05-20 Thread Ethan Merritt

On Wednesday, 20 May 2015 05:32:00 PM William G. Scott wrote:
  On May 20, 2015, at 5:38 AM, Randy Read rj...@cam.ac.uk wrote:
  
  Thanks, as always, to everyone for a thoughtful discussion!
 
 
 Alternatively, as a scientific community, perhaps it is finally time for us 
 to untwist Clippy, bending him backwards and forwards until he snaps at those 
 horrid beady little eyeballs, ditch the Comic Sans, flip Redmond the bird, 
 HTFU and learn to use LaTeX equation markup, and ask that our journals do the 
 same.  It really isn’t any harder than learning basic HTML (and predates it 
 as one of the original mark-up languages).

I don't know that we have much leverage over the whole range
of relevant publishers and journals, but certainly the IUCr
journals have in my experience always welcomed latex
submissions.  The IUCr web site provides templates and examples.

I highly recommend LyX as an alternative to Word and its ilk.
I've been using it for years to write and prepare papers for
submission, to Acta and elsewhere.

I happen to know that the IUCr website also provides a LyX
layout template to improve the WSIWYG experience in LyX
(because I donated it :-)

Contrary to rumors mentioned up-thread, I have LyX/latex 
documents going back 10+ years that open just fine in the
current version of LyX.  Yes, older documents may go through
a markup conversion step when opened, but the one-time
delay is no big deal.  You can save the updated version or not,
as you choose.

You don't really need to learn any latex to use LyX, although
it does help if you want to customize the output (i.e. you
aren't using a journal-supplied template).

None of this, however, addresses the problem of collaborating
with people who want to use MSWord.  I have not found anything
better than running MSWord in an emulator, and this of course
does not address bugs or other problems in MSWord itself.
(Well actually it does, kind of.  It used to be that fonts, 
PDF conversion, and some graphics worked better inside a linux+wine 
emulation than they did in the same MSWord executable running 
natively on Windows. I don't know if this is still true).

 Journals and funding agencies should not be demanding that 
 we use crappy broken and restrictive proprietary formats for 
 submitting papers and proposals.

Thankfully the NIH now wants PDF.  I use LyX for my NIH proposals also.
I could package up a template and bibtex style sheet if
there is interest.

Ethan


 
 Ascii text documents provide the ultimate form of universal 
 interchangeability.
 
 The syntax is actually quite straightforward and easy to learn (or look up), 
 eg:
 
 http://en.wikibooks.org/wiki/LaTeX/Mathematics
 
 LaTeX allows you to focus on content rather than document formatting.  
 Although it is definitely more badass to do this in vim, other ascii text 
 editors often have very useful LaTeX functionality.  (My favorite on OS X is 
 TextMate, version 2 of which is now free. If you code on OS X, you should 
 take a look at this.)
 
 Once you make the small investment of time learning LaTeX, it makes other 
 tasks easier.  For example, you can use jsMath to embed LaTeX-encoded 
 equations (including chemistry symbols) in web pages, eg:
 
 http://www.math.union.edu/~dpvc/jsmath/examples/welcome.html

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] strange pattern

2015-04-01 Thread Ethan Merritt

On Thursday, 02 April 2015 02:21:48 AM Gert Vriend wrote:
 The following article was rejected by 'our' journal...
 
 http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1097-0134/homepage/april_fools__day_special_papers.htm
  
 http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291097-0134/homepage/april_fools__day_special_papers.htm

A most excellent contribution!

Do you have a refmac-compatible *.cif dictionary describing
the tetrahedral sulfur center as in 3d7s (Figure 3D)?
I would hate to miss one of these should I suspect the
opportunity for novel quantum chemistry in a structure 
being refined.

Ethan

 
 Greetings
 Gert

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] A basic question about Fourier Transform

2015-01-20 Thread Ethan Merritt

On Tuesday, 20 January 2015 10:18:35 PM Chen Zhao wrote:
 Dear all,
 
 I am sorry about this slightly off-topic question. I am now a graduate TA
 for crystallography course and one student asked me a question that I
 didn't ask myself before. I don't have enough knowledge to precisely answer
 this question, so I am seeking for help here.
 
 The question is, as I rephrased it, assuming we are able to measure the
 diffraction pattern of a single molecule with acceptable accuracy and
 precision (comparable to what we have now for the common crystals), is it
 better than we measure the diffraction spots from a crystal, given that the
 spots are just a sampling of the continuous pattern from a single molecule
 and there is loss of information in the space between the spots that are
 not sampled by the lattice?

While it is true that there is a loss of information because of the space
between the Bragg reflections, this is not as bad as you might think.
The Nyquist theorem tells us that we can reconstruct a Fourier term exactly
if we can sample at one half the period of that term.
So for any given resolution of Bragg spots, the continuous transform
to half that resolution can be reconstructed.  Here can be reconstructed
implicitly includes ... if we know the phase.  So it comes back to the
phase problem.  If we could measure the phase, it would only matter to a
factor of 2 in resolution that we are not measuring the continuous transform.

By the way, as Jacob Keller alluded to earlier, XFEL diffraction from 
nanocrystals introduces a situation half way between the two cases.
Because there are only a small number of unit cells in each direction,
the observed diffraction pattern indeed contains information in between
the Bragg peaks. One approach to interpreting this data is to treat the
measured diffraction pattern as a continuous transform of a single particle,
where that single particle just happens to be a nanocrystal containing
a small number of identical unit cells.

Ethan  

 Of course this is more of a thought experiment,
 so we don't need to consider that all measurement is discrete in nature
 owing to the limitation of the pixel size. I kinda agree with him and I
 have a feeling that this is related to the sampling theorem. I do
 appreciate your valuable comments. If this is not true, why? If this is
 true, what is its effect on electron density?
 
 Thank you so much for your attention and your help in advance!
 
 Best,
 Chen

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] Normal mode refinement

2014-10-21 Thread Ethan Merritt

On Tuesday, 21 October 2014 07:39:53 AM Appu kumar wrote:
 Dear All,
 Thank you very much for valuable suggestions and educating me on the normal
 mode refinement. Actually, I am trying to refine a protein (cytosolic
 domain and trans-membrane domain). I found a solution through PHASER and
 density looks really good in both domain but as i proceeds with refinement
 density remain great in both domain till Rfree around 38%. Interestingly,
 with further refinement cycle, Rfree reduced to  30% but the density in the
 trans-membrane domain becomes very weak. That is why i am wondering whether
 it is possible to improve the density in the trans-membrane domain by using
 Normal mode refinement. Conservatively speaking, it could be possible that
 trans-membrane is highly flexible or disordered and after much cerebration,
 i am thinking to incorporate the normal mode refinement to monitor if there
 is any improvement in electron density trans-membrane domain.

Please keep in mind that if the density is poor because the protein really
is disordered, a perfect description of those disordered cell contents will
perfectly reproduce that poor density.   So improved description does not
necessarily imply improved map quality.

This is quite different from the case of a poor model for a well-ordered
structure.  Here also you will see a low quality map, but in this case it
will improve as your description of the cell contents improves.


Ethan



 I would follow suggestions of Dr, Mande and Dr. Ethan.  Also, would give a
 try to what Arpita has suggested.  I further, warmly welcome any suggestion
 on refinement procedure to improve electron density in flexible or
 disordered trans-membrane domain.
 Appu
 
 On 20 October 2014 23:41, Arpita Goswami bt.arp...@gmail.com wrote:
 
  Hello,
 
  You can also contact elNemo or NOMAD-Ref server developers about getting
  covariance/correlation matrices from normal mode analysis outputs to know
  the correctly coordinated mobile atoms. In this way you can compare with
  biological data also. In Shekhar's said paper K. Suhre (one of the
  developer of el-Nemo server) has done the same very correctly.
 
  best wishes,
  Arpita
 
  On Tue, Oct 21, 2014 at 5:40 AM, Appu kumar appu.kum...@gmail.com wrote:
 
  Dear CCP4 Users,
  I seek your valuable advice and suggestion in carrying out the normal
  mode structure refinement which manifest the dynamics of protein as linear
  combination of harmonic modes, used to describe the motion of protein
  structure in collective fashion. Studies suggest that it is highly useful
  in refining the protein structure which harbors a considerable magnitude of
  flexibility in atomic position owing to high thermal factors.
  Therefor I want to know is there any software/script available to execute
  the normal mode of refinement. Thanks a lot in advance for your imperative
  suggestions
 
  Appu
 
 
 
 
  --
  Arpita
 
  --
  Arpita Goswami
  Senior Research Fellow
  Structural Biology Laboratory
  Centre for DNA Fingerprinting and Diagnostics (CDFD)
  Tuljaguda (Opp MJ Market),
  Nampally, Hyderabad 500 001
  INDIA
  Phone: +91- 40- 24749401/404
  Mobile: 9390923667, 9502389184
  Email: arp...@cdfd.org.in
 

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] Space group numbers

2014-10-04 Thread Ethan Merritt

On Saturday, 04 October 2014 10:26:51 AM Kay Diederichs wrote:
 I do agree that in your use case it may be helpful to order abc as
 long as the symmetry is unknown. I also do understand that the H-M
 symbols allow to describe the different settings, but this is a level of
 complication that is not necessary to understand for todays's typical
 crystallographer, because fortunately e.g. the C121 setting is
 practically uniformly used (and chosen by POINTLESS, as far as I
 understand) to represent C2 crystals.

I will not stick my nose into the main discussion here,
but I will note that this part of it is incorrect.
POINTLESS  uses the IUCr convention to select either C2 or I2
depending an the beta angle.  

This caused me great confusion when I first encountered it, not least
because a very slight variation in the refined beta angle can change the
spacegroup in the output file.

Ethan

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] Hosed-Up X-Ray Structures: A Big Problem

2014-06-13 Thread Ethan Merritt

On Friday, 13 June 2014 10:12:50 AM Tim Gruene wrote:
 Hi Ethan,
 
 Maybe I miss something, but whenever an error in one of the cif-files
 has been reported, be it directly to Garib, or publicly on the ccp4bb,
 Garib (I assume) fixed very quickly - I don't quite understand why we
 need a new term for this process?

See the other thread ccp4 ligand tools +  wwPDB validation = bug reports

Because the error is not in a pre-packaged cif file.
Nor is it in a ccp4 program per se.
It is in a library that is used by cprodrg to generate a cif file
for previously unknown ligands.

This library originally came from the Dundee folks,
not ccp4, and it was not clear who if anyone was maintaining it.

In an admirably quick response, Alexander Schuettelkopf has now
expressed his willingness to respond to such bug reports and update
the library.

So that's good news for cprodrg, and I gather that indeed the fixes
will appear in future ccp4 updates.

But the problem is more general.
For example, I have had analogous problems with Grade.
There again it is clear that this can affect other ccp4-ers,
so ccp4bb seems to me a good place to mention any bugs or quirks that
contribute to structure refinement errors so that others are aware of
potential problems.  The eventual fix may have to come from elsewhere
(e.g. GlobalPhasing in the case of Grade).  Unlike prodrg, the Grade
code and libraries so far as I know are not available for inspection or
patching locally.

Paul Emsley has Emailed my separately that there is a new project
ACEDRG in the offing that may take over the prodeg/Grade niche inside ccp4.
Perhaps someone involved in ACEDRG will post a summary of what it
will offer?


cheers,

Ethan

 
 On 06/12/2014 10:45 PM, Ethan A Merritt wrote:
  [...]
  Indeed.  All of the library-generation tools I am aware of are flawed in
  their own idiosyncratic ways.   I think I shall start a campaign to treat
  errors in the cif libraries as bugs, and encourage people to report
  these bugs in the libraries we all use just as they do for bugs in the
  programs we all use.  
  
  Ethan
  [...]
 

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] crystallographic confusion

2014-04-19 Thread Ethan Merritt

On Saturday, 19 April 2014 02:52:38 PM Zbyszek Otwinowski wrote:
 Why not improve effective resolution to include consideration of solvent
 content? Due to constant packing density of proteins, it would become a
 synonim (by appropriate transformation) to number of observations per
 modelled atom.

Following that line of thought, perhaps reporting the observation/parameter
ratio would provide a more informative number than resolution.
Of course that leads to a morass of argumentation about whether to modify
it by the number and class of restraints used during refinement.

Ethan

 
 Zbyszek Otwinowski
 
 
  Dear Dale, dear Kay,
 
  last year, we discussed this kind of problems (Urzhumtseva et al., 2013,
  Acta Cryst., D69, 1921-1934).
  Our approach does not tell you where to cut your data and which
  reflections to accept / reject but as soon as you have your set of
  reflections, you calculate very formally and very strictly the effective
  resolution of ANY diffraction data set, with ANY completeness, with ANY
  composition of measured / missed reflections.  For a complete data set,
  d_effective coincides with the d_high value but is different for
  incomplete data sets. The article contains a number of examples.
 
  With this approach, the discussion of the completeness of the
  highest-resolution shell becomes irrelevant; one can simply cite the
  effective resolution. I hope this can help.
 
  With best regards,
 
  Sacha Urzhumtsev
 
  
  De : CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] de la part de Dale
  Tronrud [de...@daletronrud.com]
  Envoyé : samedi 19 avril 2014 03:20
  À : CCP4BB@JISCMAIL.AC.UK
  Objet : Re: [ccp4bb] crystallographic confusion
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
 
 I see no problem with saying that the model was refined against every
  spot on the detector that the data reduction program said was observed
  (and I realize there is argument about this) but declare that the
  resolution of the model is a number based on the traditional criteria.
 
 
  Dale Tronrud
 
 
 Zbyszek Otwinowski
 UT Southwestern Medical Center at Dallas
 5323 Harry Hines Blvd.
 Dallas, TX 75390-8816
 Tel. 214-645-6385
 Fax. 214-645-6353

-- 
mail:   Biomolecular Structure Center,  K-428 Health Sciences Bldg
MS 357742,   University of Washington, Seattle 98195-7742

Re: [ccp4bb] Fwd: [ccp4bb] CCP4-6.4.0 source code building failed in Mac OS X 10.8.5

2014-03-03 Thread Ethan Merritt

On Tuesday, 04 March 2014 01:33:58 PM wu donghui wrote:
 Dear Tim,
 
 Here I attached the config.log file for your help. I have tried to use
 either gcc-4.2.1 (Applications/Xcode.app/Contents/Developer/usr
 --with-gxx-include-dir=/usr/include/c++/4.2.1), or
 g++-4.2.1(/Applications/Xcode.app/Contents/Developer/usr
 --with-gxx-include-dir=/usr/include/c++/4.2.1) or gfortran-4.8.1 compiler.
 Still same error appeared as attached from the config.log file. Thanks for
 your attention.

Looks like either you do not have the g++ compiler completely installed,
or it's installed some place the configure script doesn't know about.
Does it work to compile something using gcc-4.2.1 directly from the
command line?  
If not then you need to get that working first.
If it does work, then you need to figure out which of the environmental
variables in your interactive session need to be added also to the
ccp4 setup script.

Note that it's not finding gcc for the C compiles either.
It's using the Apple compiler.


Ethan



 Best,
 
 Donghui
 
 
 
 On Mon, Mar 3, 2014 at 11:02 PM, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote:
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Dear Donghui,
 
  did you already take a look into config.log to read the error message
  why your gcc-compiler does not work? You should start at the end of the
  log file and scroll backwards until you find the error message.
 
  Best,
  Tim
 
  On 03/03/2014 03:51 PM, wu donghui wrote:
   -- Forwarded message --
   From: wu donghui wdh0...@gmail.com
   Date: Mon, Mar 3, 2014 at 10:44 PM
   Subject: Re: [ccp4bb] CCP4-6.4.0 source code building failed in Mac OS X
   10.8.5
   To: Marcin Wojdyr woj...@gmail.com
  
  
   Dear Marcin,
  
   The reason that I want to build from source is that running ipmosflm can
   not be done from binary code, while binary code only supports imosflm
   running.
  
   Thanks.
  
   Best,
  
   Donghui
  
  
   On Mon, Mar 3, 2014 at 8:09 PM, Marcin Wojdyr woj...@gmail.com wrote:
  
   Yes, I set CC=gcc-4.2.1 in cj.rc file or type in command line.
  
   As is shown, it can identify gcc for gcc-4.2.1
  
   checking for gcc... gcc-4.2.1
   checking whether the C compiler works... no
  
   To me it looks that you set compiler to non-existent gcc-4.2.1, so it
   doesn't work.
  
   Do you have a reason to build CCP4 from source? There are binaries
   available for OSX.
  
   Best regards
   Marcin
  
  
 
  - --
  - --
  Dr Tim Gruene
  Institut fuer anorganische Chemie
  Tammannstr. 4
  D-37077 Goettingen
 
  GPG Key ID = A46BEE1A
 
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.12 (GNU/Linux)
  Comment: Using GnuPG with Icedove - http://www.enigmail.net/
 
  iD8DBQFTFJl9UxlJ7aRr7hoRAo/PAJwLSzdU2Undrc0tosdeSSpdC1aSxACgvzUP
  khM2NeDGueN6Saat/3kZ+iU=
  =Z6Ci
  -END PGP SIGNATURE-

Re: [ccp4bb] High Rwork/Rfree vs. Resolution

2014-02-23 Thread Ethan Merritt

On Sunday, 23 February 2014 09:16:41 PM Andreas Förster wrote:
 On 22/02/2014 10:15, Mark van Raaij wrote:

  But I would really want to make a general comment - not ALL structures
  can be better than the average!
 
 Except structures from the Lake Wobegon Center for Structural Biology, 
 of course.

Ah, but it _is_ be possible for each new structure deposition to be
better than the average quality of all previously deposited structures.

And in fact the continual improvement of detectors, programs, and
refinement protocols pushes things in exactly this direction.

I have noted only half in jest that this phenomenon is important
to the wide acceptance of validation tools like Molprobity.
By reporting quality relative to all previous structures in the PDB,
the program authors have cleverly arranged for the program to
report to most users Green light! Your new model is better than
most structures in the PDB!.  Everyone likes to be patted on the
back and told they have done a good job, so they like the program
and continue to use it. This makes a red light score, when it
does happen, stand out more and therefore makes it more likely that
users will take it seriously.

Re: [ccp4bb] AW: [ccp4bb] Dependency of theta on n/d in Bragg's law

2013-08-22 Thread Ethan Merritt

On Thursday, August 22, 2013 02:19:11 pm Edward A. Berry wrote:
 One thing I find confusing is the different ways in which d is used. 
 In deriving Braggs law, d is often presented as a unit cell dimension, 
 and n accounts for the higher order miller planes within the cell.

It's already been pointed out above, and you sort of paraphrase it later,
but let me give my spin on a non-confusing order of presentation.

I think it is best to tightly associate n and lambda in your mind
(and in the mind of a student). If you solve the Bragg's law equation for
the wavelength, you don't get a unique answer because you are actually
solving for n*lambda rather than lambda.

There is no ambiguity about the d-spacing, only about the wavelength
that d and theta jointly select for.

That's why, as James Holton mentioned, when dealing with a white radiation
source you need to do something to get rid of the harmonics of the wavelength
you are interested in.

 But then when you ask a student to use Braggs law to calculate the resolution
 of a spot at 150 mm from the beam center at given camera length and 
 wavelength,
 without mentioning any unit cell, they ask, do you mean the first order 
 reflection?

I would answer that with Assume a true monochromatic beam, so n is necessarily 
equal to 1.

 Yes, it would be the first order reflection from planes whose spacing is the 
 answer i am looking for, but going back to Braggs law derived with the unit 
 cell
 it would be a high order reflection for any reasonable sized protein crystal.

For what it's worth, when I present Bragg's law I do it in three stages.
1) Explain the periodicity of the lattice (use a 2D lattice for clarity).
2) Show that a pair of indices hk defines some set of planes (lines)
   through the lattice.
3) Take some arbitrary set of planes and use it to draw the Bragg construction.

This way the Bragg diagram refers to a particular set of planes,
d refers to the resolution of that set of planes, and n=1 for a 
monochromatic X-ray source.  The unit cell comes back into it only if you
try to interpret the Bragg indices belonging to that set of planes.

Ethan

 
 Maybe the mistake is in bringing the unit cell into the derivation in the 
 first place, just define it in terms of 
 planes. But it is the periodicity of the crystal that results in the 
 diffraction condition, so we need the unit cell 
 there. The protein is not periodic at the higher d-spacing we are talking 
 about now (one of its fourier components is, 
 and that is what this reflection is probing.)
 eab
 
 Gregg Crichlow wrote:
  I thank everybody for the interesting thread. (I'm sort of a nerd; I find 
  this interesting.) I generally would always
  ignore that �n� in Bragg's Law when performing calculations on data, but 
  its presence was always looming in the back of
  my head. But now that the issue arises, I find it interesting to return to 
  the derivation of Bragg's Law that mimics
  reflection geometry from parallel planes. Please let me know whether this 
  analysis is correct.
 
  To obtain constructive 'interference', the extra distance travelled by the 
  photon from one plane relative to the other
  must be a multiple of the wavelength.
 
  \_/_
 
  \|/_
 
  The vertical line is the spacing d between planes, and theta is the angle 
  of incidence of the photons to the planes
  (slanted lines for incident and diffracted photon - hard to draw in an 
  email window). The extra distance travelled by
  the photon is 2*d*sin(theta), so this must be some multiple of the 
  wavelength: 2dsin(theta)=n*lambda.
 
  But from this derivation, �d� just represents the distance between /any/ 
  two parallel planes that meet this Bragg
  condition � not only consecutive planes in a set of Miller planes. However, 
  when we mention d-spacing with regards to a
  data set, we usually are referring to the spacing between /consecutive/ 
  planes. [The (200) spot represents d=a/2
  although there are also planes that are spaced by a, 3a/2, 2a, etc]. So the 
  minimum d-spacing for any spot would be the
  n=1 case. The n=2,3,4 etc, correspond to planes farther apart, also 
  represented by d in the Bragg eq (based on this
  derivation) but really are 2d, 3d, 4d etc, by the way we define �d�. So we 
  are really dealing with
  2*n*d*sin(theta)=n*lambda, and so the �n�s� cancel out. (Of course, I�m 
  dealing with the monochromatic case.)
 
 I never really saw it this way until I was forced to think about it 
  by this new thread � does this makes sense?
 
  Gregg
 
  -Original Message-
  From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
  Edward A. Berry
  Sent: Thursday, August 22, 2013 2:16 PM
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: Re: [ccp4bb] AW: [ccp4bb] Dependency of theta on n/d in Bragg's law
 
  herman.schreu...@sanofi.com mailto:herman.schreu...@sanofi.com wrote:
 
Dear James,
 
thank you very much for this

Re: [ccp4bb] TLS refinement and ANISOU records

2013-08-08 Thread Ethan Merritt

On Thursday, August 08, 2013 11:39:22 am Omid Haji-Ghassemi wrote:
 Dear all,
 
 I was about to deposit a few structures to the pdb when I noticed the mean
 B-factors were larger than one might expect.
 
 All the structures were refined using TLS refinement.
 
 During refinement in Refmac the average temperature factors for each
 structure is reasonable. For example, a structure at 2.75� has a mean
 B-factor of 40; however, after adding the ANISOU records as required by
 the PDB, I noticed the average B-factors double.

Please see my paper:
  E. A. Merritt (2011). 
  Some Beq are more equivalent than others. Acta Cryst. A67, 512-516.
  http://skuld.bmsc.washington.edu/parvati/ActaA_67_512.pdf

In short, the quantity stored in the B field of a PDB file after TLS
refinement is Beq, which overestimates what the isotropic B factor would
have been if you had refined without TLS.  So in general the average B
after TLS refinement is always higher than the average B without TLS.
The problem is that the two quantities marked average B are not
directly comparable.

Having said that, the overestimate is not usually as much as a factor of 2.
So something else may indeed be causing a problem in your case.

Ethan


 
 Is this normal?
 
 Sincerely,
 Omid
 
 ---
 ---
 Omid Haji-Ghassemi, Graduate Student
 Department of Biochemistry  Microbiology
 University of Victoria
 PO Box 3055 STN CSC
 Victoria, BC, V8W 3P6
 CANADA
 
 Tel:250-721-8945
 Fax:250-721-8855
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] TLS refinement and ANISOU records

2013-08-08 Thread Ethan Merritt

On Thursday, August 08, 2013 01:51:34 pm Omid Haji-Ghassemi wrote:
 Dear Robbie, Marcus and Reginald,
 
 Thanks again for your replies, I truly appreciate the help.
 
 The B-factors was set to 20 when performing TLS refinement so I don't
 think that is the problem.
 
 I also tried Marcus's suggestion using output from coot, with no luck.
 
 The only thing left to try is to test alternative TLS group as Reginald
 have suggested.

You have only told us about an increase in average B, not whether it is
uniformly inflated. Possibly the output from analysis by the Parvati server
http://skuld.bmsc.washington.edu/parvati
would indicate specific parts of your structure that are behaving
badly during refinement.

Ethan

 
 Cheers
 Omid
 
  Hi Omid,
 
  Sometimes the choice of TLS groups and to a lesser extent the initial
  B-factor matter a lot. You should try a few other TLS group selections and
  see if these give nicer results. Things to try: TLSMD, including or
  excluding ligands and carbohydrates, other common-sense or gut-feeling
  structure partitionings.  If you have a lot of different groupings to
  test, you can reset the B-factor and do pure TLS refinement (i.e. 0 cycles
  of restrained refinement) for all of them. You can then use the best one
  for your 'final' refinement. It's much faster then trying your final
  refinement with all TLS groups selections.
 
  Cheers,
  Robbie
 
  Sent from my Windows Phone
  
  Van: Omid Haji-Ghassemi
  Verzonden: 8-8-2013 21:55
  Aan: CCP4BB@JISCMAIL.AC.UK
  Onderwerp: Re: [ccp4bb] TLS refinement and ANISOU records
 
  Dear Ethan,
 
  Thank you for your reply.
 
  I will try to review my refinement protocol once more; however, I am still
  perplexed at what lies at the heart of the problem.
 
  Overestimation of average B-factor using TLS is perfectly sound, but I am
  not sure why all my structures the average increases tremendously.
 
  In one case it increases from 16.36 to 73.02 for a 2.3Ang structure.
 
  I already tried changing weights and number of TLS rounds, which resulting
  in only a small change in average B.
 
  Omid
 
  On Thursday, August 08, 2013 11:39:22 am Omid Haji-Ghassemi wrote:
  Dear all,
 
  I was about to deposit a few structures to the pdb when I noticed the
  mean
  B-factors were larger than one might expect.
 
  All the structures were refined using TLS refinement.
 
  During refinement in Refmac the average temperature factors for each
  structure is reasonable. For example, a structure at 2.75ï¿½ has a
  mean
  B-factor of 40; however, after adding the ANISOU records as required by
  the PDB, I noticed the average B-factors double.
 
  Please see my paper:
E. A. Merritt (2011).
Some Beq are more equivalent than others. Acta Cryst. A67, 512-516.
http://skuld.bmsc.washington.edu/parvati/ActaA_67_512.pdf
 
  In short, the quantity stored in the B field of a PDB file after TLS
  refinement is Beq, which overestimates what the isotropic B factor would
  have been if you had refined without TLS.  So in general the average B
  after TLS refinement is always higher than the average B without TLS.
  The problem is that the two quantities marked average B are not
  directly comparable.
 
  Having said that, the overestimate is not usually as much as a factor of
  2.
  So something else may indeed be causing a problem in your case.
 
Ethan
 
 
 
  Is this normal?
 
  Sincerely,
  Omid
 
  ---
  ---
  Omid Haji-Ghassemi, Graduate Student
  Department of Biochemistry  Microbiology
  University of Victoria
  PO Box 3055 STN CSC
  Victoria, BC, V8W 3P6
  CANADA
 
  Tel:250-721-8945
  Fax:250-721-8855
 
 
  --
  Ethan A Merritt
  Biomolecular Structure Center,  K-428 Health Sciences Bldg
  University of Washington, Seattle 98195-7742
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ethan Merritt

On Wednesday, August 07, 2013 04:00:16 pm Ed Pozharski wrote:
 On 08/07/2013 05:54 PM, Nat Echols wrote:
  Personally, if I need to change a chain ID, I can use Coot or pdbset 
  or many other tools.  Writing code for this should only be necessary 
  if you're processing large numbers of models, or have a spectacularly 
  misformatted PDB file.  Again, I'll repeat what I said before: if it's 
  truly necessary to view or edit a model by hand or with custom shell 
  scripts, this often means that the available software is deficient.  
  PLEASE tell the developers what you need to get your job done; we 
  can't read minds.
 
 Nat,
 
 I don't think anyone here really means that the only way to change a 
 chain ID is to write, say, a perl script.  But an interpreter of the 
 kind advocated by James (as much as I have hijacked/misinterpreted his 
 vision) could indeed be very useful for people pursuing simple 
 bioinformatics projects and new ways to analyse structural models. 

We tackled this a while back for the then-current incarnation of mmCIF.

   http://www.bmsc.washington.edu/parvati/mmLib.pdf

I suppose it will all have to be revisited so that it knows the quirks,
features, and foibles of the new and improved mmCIF.

Ethan


 While 
 I understand your view that everyone should seek assistance from 
 developers with every problem encountered, I also recall some 
 reasonable idea about self-sufficiency that should cover scientific 
 research (something like give man a fish and you feed him for a day, 
 teach him to fish and he starts paying taxes... something along these 
 lines ;).  There is a difference betweens tools that allow to easily 
 perform useful non-standard analysis and highly specialized tools that 
 strive to cover every situation imaginable.
 
 Cheers,
 
 Ed.
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] mmCIF as working format?

2013-08-07 Thread Ethan Merritt

On Wednesday, August 07, 2013 04:54:39 pm Jeffrey, Philip D. wrote:
  Nat Echols wrote:
  Personally, if I need to change a chain ID, I can use Coot or pdbset or 
  many other tools.  Writing code for
  this should only be necessary if you're processing large numbers of models, 
  or have a spectacularly
  misformatted PDB file.
 
 Problem.  Coot is bad at the chain label aspect.
 Create a pdb file containing residues A1-A20 and X101-X120 - non-overlapping 
 numbering.
 Try to change the chain label of X to A.
 I get WARNING:: CONFLICT: chain id already exists in this molecule

That would be a bug.  But it hasn't been true for any version of coot
that I have used.  As you say, this is a common thing to do and I am
certain I would have noticed if it didn't work. I just checked that
it isn't true for 0.7.1-pre.

What _is_ true is that renaming X to A in this case will not re-order
the residues in the file.  So if you had A1-100 followed by B1-10
followed by X101-200 there would not be a peptide  link between A100 and
A(old X)101 after the renaming.
To fix this you need to write out the file and use an editor to move the
records for A101-200 to immediately after the records for A1-100.

This does illustrate the point that expecting all tools to handle all
possible manipulations is unrealistic.  I think there will always be a
need for a separate tool that can do anything imaginable, whether that
tool is vi or emacs or some spiffy new mmCIF editing GUI.

The problem with this is that any tool capable or arbitrarily editing
your file is also capable of subtly mangling your file.  The current PDB
format is horribly sensitive to this.  For example if you
reorder/renumber/relabel ATOM records in a PDB file then references to them
in the header records (TLS, SITE, etc) and LINK/CONECT records will now point
to the wrong atoms.   I am not convinced that the new mmCIF format has gotten
this quite right either, at least in the examples given, but it does have the
flexibility to attach such links or properties directly to the ATOM record
where it is more likely to be carried along correctly if moved. 
That by itself is IMHO enough to justify the switch from PDB to mmCIF.

Ethan


 
 This is (IMHO) a bizarre feature because this is exactly the sort of thing 
 you do when building structures.
 
 Therefore I do one of two things:
 1.  Open it in (x)emacs, replace  X  with  A  and Bob's your uncle.
 2.  Start Peek2 - that's my interactive program for doing simple and stupid 
 things like this.  I type read test.pdb and chain and Peek2 prompts me at 
 perceived chain breaks (change in chain label, CA-CA breaks, ATOM/HETATM 
 transitions c) and then write test.pdb.   Takes less than 10 seconds.  
 CCP4i would probably still be launching, as would Phenix.
 
 The reason I do #1 or #2 is not to be a Luddite, but to do something trivial 
 and boring quickly so I can get back to something interesting like building 
 structures, or beating subjects to death on CCP4bb.
 
 What's lacking is an interactive, or just plain fast method in any guise, way 
 of doing simple PDB manipulations that we do tons of times when building 
 protein structures.  I've used Peek2 thousands of times for this purpose, 
 which is the only reason it still exists because it's a fairly stupid 
 program.  A truly interactive version of PDBSET would be splendid.  But, 
 again, it always runs in batch mode.
 
 mmCIF looked promising, apropos emacs, when I looked at the spec page at:
 http://www.iucr.org/__data/iucr/cifdic_html/2/cif_mm.dic/Catom_site.html
 because that ATOM data is column-formatted.  Cool.  However looking at 
 6LYZ.cif from RCSB's site revealed that the XYZ's were LEFT-justified: 
 http://www.rcsb.org/pdb/files/6LYZ.cif
 which makes me recoil in horror and resolve to use PDB format until someone 
 puts a gun to my head.
 
 Really, guys, if you can put multiple successive spaces to the RIGHT of the 
 number, why didn't you put them to the LEFT of it instead ?  Same parsing, 
 better readability.
 
 Phil Jeffrey
 Princeton
 (using the vernacular but deathly serious about protein structure)
 
 
 
 
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] post to ccp4bb

2013-07-22 Thread Ethan Merritt

On Monday, 22 July 2013, Katherine Donovan wrote:
 Hi All,
 
 I have a data set that was collected to about 2.2A, which I have processed in 
 either P21 (to 2.4 A) or C2221 (2.25A).

So I'm confused.
You may not know what the spacegroup is, but you are processing
the same spots either way.  Why would you choose different
resolutions for the two processing runs?

 I am unsure which space group is more correct.
 
 I have a higher symmetry space group with higher resolution and average 
 statistics or a lower symmetry space group with lower resolution and great 
 statistics.

By great statistics, do you mean the refined R/Rfree in P21?
But you refined that model as a twin, which means that
the R factors are expected to be lower.  Refining as a twin
always produces lower R factors, whether or not the structure
really is twinned.

It doesn't make sense to me that you would compare at two 
different resolutions.  It's the _same data_ in either case.
If you refine two different models against the same data,
then you have a legitimate basis on which to compare them.
This way - not really.

Which brings up the point that something seems to have
gone wrong in one of your processing runs.
Both runs claim mean (I/sigI) in the outer shell is 2.0,
but in one case this is for the 2.4A shell and in the other case
it's for the 2.2A shell.  That is unlikely to be correct.
I/sigI should not depend on the Laue group. 

If it were me, I'd forget about monoclinic.
Also I'd try to push the data processing in orthorhombic
to a bit higher resolution.  Mean I/sig(I) of 2 in the 2.2A shell
leads me to think you would still be adding information
from reflections at 2.1A or even 2.0A. 

Ethan

 The statistics provided by aimless below.
 
 
 Any help would be hugely appreciated.
 
 Thanks,
 
 Katherine
 
 
 
 
 P21
 AIMLESS
 P21 and cut the data to 2.4A for a Mn I/sd  2.
 
 Average unit cell:   74.68130  129.290  106.8 90
Overall  InnerShell  OuterShell
 Low resolution limit   48.09 48.09  2.44
 High resolution limit   2.40 13.15  2.40
 Rmerge  (within I+/I-) 0.085 0.038 0.581
 Rmerge  (all I+ and I-)0.099 0.042 0.687
 Rmeas (within I+/I-)   0.117 0.053 0.792
 Rmeas (all I+  I-)0.116 0.050 0.798
 Rpim (within I+/I-)0.080 0.037 0.534
 Rpim (all I+  I-) 0.059 0.026 0.405
 Rmerge in top intensity bin0.045- -
 Total number of observations  352569  2057 17425
 Total number unique92184   569  4538
 Mean((I)/sd(I))  9.6  23.6   2.0
 Mn(I) half-set correlation CC(1/2) 0.995 0.995 0.648
 Completeness   100.0  97.2 100.0
 Multiplicity 3.8   3.6   3.8
 
 PHENIX – XTRIAGE
 One possible pseudo merohedral twin operator
 2-fold axis
 h, -k, -h-l
 
 I**2/I**2 = 2.032
 F**2/F**2 = 0.787
 |E**2-1| = 0.734
 |L|, L**2 = 0.490, 0.321
 Multivariate Z score L-test = 0.616
 
 NZ test = Maximum deviation acentric = 0.007
 Maximum deviation centric = 0.051
 L test =   Mean L = 0.490
 
 Estimated twin fraction:
 0.450 (Britton analyses)
 0.477 (H-test)
 0.478 (Maximum likelihood method)
 
 Likely point group of the data is C 2 2 2
 
 Analysis of the systematic absences indicates a number of likely space group 
 candidates:
 C 2 2 21
 
 Patterson analysis of peak with length larger than 15 Angstrom:
 Frac. Cood = 0.00, 0.166, 0.00
 Distance to origin = 21.530
 Height (origin = 100) = 3.787
 p-value (height) = 9.991e-01
 
 Final REFMAC refinement in P21
 Rfactor = 0.2391
 Rfree = 0.2674
 After multiple rounds of refinement the twinning information is:
 Twin domains = 2
 Twin fractions = 0.5201, 0.4799
 
 FINAL refinement PHENIX – P21
 Rwork = 0.1637
 Rfree = 0.1938
 Twin fraction = 0.5 for twin operator h, -k, -h-l
 Ramachandran outliers = 0.1%
 Rotamer outliers = 3.6%
 C-beta outliers = 0
 
 
 C2221
 AIMLESS
 Cut the data back to 2.25 for a Mn I/sd 2.
 
 Average unit cell:   74.68  247.413090 90 90
Overall  InnerShell  OuterShell
 Low resolution limit   48.09 48.09  2.31
 High resolution limit   2.25  9.81  2.25
 
 Rmerge  (within I+/I-) 0.117 0.044 0.984
 Rmerge  (all I+ and I-)0.126 0.046 1.068
 Rmeas (within I+/I-)   0.136 0.051 1.145
 Rmeas (all I+  I-)0.135 0.050 1.147
 Rpim (within I+/I-)0.069 0.026 0.577
 Rpim (all I+  I-)

[ccp4bb] How to disable auto-renaming of files in ccp4i refmac menu?

2013-07-12 Thread Ethan Merritt

I dearly love ccp4i, but there is one aspect of the interface that
drives me to distraction.  When I type (or paste) an input file name
(let's say input.mtz) into the refmac menu, ccp4i changes various
output file names automatically according to some scheme that doesn't
at all match my work flow.  How can I disable this?  

I imagine it is simply a matter of commenting out a couple of lines
in a script but I haven't been able to figure out which lines those are.

Better yet - if it must rename, is there a way I can give it a new
rule for naming output files?   My workflow wants all output files
from the same refinement run to have the same base name.  E.g.
RUN_N.pdb RUN_N.mtz and RUN_N.tls  regardless of what the input file 
names might be (typically dataset.mtz and RUNsomething-coot-N.pdb).

Is there still time to request this as a feature for the next
release?

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Definition of diffractometer

2013-06-19 Thread Ethan Merritt

On Wednesday, June 19, 2013 11:11:01 am Edward A. Berry wrote:
 Somewhere I got the idea that a diffractometer is an instrument that measures 
 one
 reflection at a time. Is that the case, and if so what is the term for 
 instruments
 like rotation camera, weisenberg, area detector? (What is an area detector?).

As I originally learned the term, it meant a photon-counting device (discrete 
counts)
as opposed to film or other analog measurement.

Stout  Jensen (1968 1st ed page 149):
  Two general methods are available for measuring the intensities of diffracted
  beams. Either the beams may be detected by some sort of quantum counting 
device
  which measures the number of photons directly (diffractometer or counter 
methods)
  or else the degree of blackening of spots on diffraction photographs may be
  measured and taken as proportional to the beam intensity (photographic 
methods).

All rotation cameras and Weisenberg cameras that I have encountered used
film or image plates for recording data, so they fall in the photographic
methods category.  But really both of these refer to the geometry used 
during the experiment.  In principle you could mount a Pilatus detector
on a rotation camera, or kit out a cylindrical drift chamber as a
Weissenberg camera (maybe).  That would shift them into the diffractometer
category.

Area detectors came later. But they too come in both photon-counting varieties
(multiwire detectors, pixel detectors) and analog proportional detectors
(imaging plates, CCD).   The line is blurred in the case of CCD/pixel detectors
operating in a mode where accumulated charge is translated back into a specific
number of photons.

 Logically I guess a diffractometer could be anything that measures 
 diffraction, 
 and that seems to be view of the wikipedia article of that name.

That does not match the historical use of the term.


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Off-topic: NMR and crystallography

2013-06-09 Thread Ethan Merritt

On Sunday, 09 June 2013, Theresa Hsu wrote:
 Dear all
 
 A question for the cross-trained members of this forum - for small sized 
 proteins, is NMR better than crystallography in terms of data collection 
 (having crystals in the first place) and data processing? How about membrane 
 proteins?

A relevant study is the comparison by Yee et al (2005) JACS 127:16512.
  http://pubs.acs.org/doi/abs/10.1021/ja053565+

They tried to solve 263 small proteins using both NMR and crystallography.
43 only worked for NMR
43 only worked for X-ray
21 could be solved either way

So you could say it was a toss-up, but consider that
- As the size gets larger, NMR becomes increasingly impractical
- 156 (60%) weren't solved by either NMR or crystallography.
  What is the relative cost of the failed attempt?

Ethan

Re: [ccp4bb] anomalous scattering server down?

2013-06-01 Thread Ethan Merritt

On Saturday, 01 June 2013, Edward A. Berry wrote:
 Is Ethan Merritt's anomalous scattering page at:
 http://www.bmsc.washington.edu/scatter/
 down or moved, or  the firewall I'm behind is blocking it?

The UW, in its infinite wisdom, scheduled a power outage 
today so that they could replace the network infrastructure.
Then to be on the safe side someone took down the network 
in my building early.

Anyhow, when I get the all-clear I will head back into the
lab to restart everything.

Ethan



 
 I want to check feasibility of a native-iron MAD experiment,
 and I'm not very good at math.
 
 thanks,
 eab

Re: [ccp4bb] atomic coloring for the color blind

2013-05-31 Thread Ethan Merritt

On Friday, May 31, 2013 01:34:51 pm Phoebe A. Rice wrote:
 I feel badly that one of my undergrads had trouble telling an O from a C in a 
 pymol homework set because he's color blind. (The assignment involved telling 
 me why the a GTP analog (GDPCP) wasn't hydrolyzed).
 Is there a handy by-atom coloring scheme I can recommend that works for the 
 red-green color blind?

Phoebe:

Here is the podo color palette recommended as being distinguishable
by both protanopic and deuteranopic color-blind viewers.  The down side is
that this is is a more stringent restriction than accommodating red/green
color defects alone, and makes the colors less distinct for normal-vision 
viewers

%
# This file is distributed as part of gnuplot.
# Palette of colors selected to be easily distinguishable by
# color-blind individuals with either protanopia or deuteranopia
# Bang Wong [2011] Nature Methods 8, 441.
set linetype 1 lc rgb black
set linetype 2 lc rgb #e69f00
set linetype 3 lc rgb #56b4e9
set linetype 4 lc rgb #009e73
set linetype 5 lc rgb #f0e442
set linetype 6 lc rgb #0072b2
set linetype 7 lc rgb #d55e00
set linetype 8 lc rgb #cc79a7
set linetype cycle 8
%

Translating these colors to atom types is another question, however.

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Fwd: Re: [ccp4bb] reference for true multiplicity?

2013-05-14 Thread Ethan Merritt

On Tuesday, May 14, 2013 01:58:06 pm Colin Nave wrote:

 The use of the term redundancy (real or otherwise!) in crystallography 
 is potentially misleading as the normal usages means superfluous/ surplus 
 to requirements.  

That may be true in the UK, but on this side of the pond redundancy 
normally refers to ensuring a safety margin by having more of whatever
than is strictly needed for functionality, so that even if some of the
whatsits fail you have enough remaining to go on with. The use of
the term in crystallography is perfectly normal to American ears.

Here's a definition from Wikipedia

 redundancy is the duplication of critical components or functions 
  of a system with the intention of increasing reliability of the 
  system...

just another tidbit of cross-pond difference in language.

Ethan


 The closest usage I can find from elsewhere is in information theory where it 
 is applied for purposes of error detection when communicating over a noisy 
 channel. Seems similar to the crystallographic use.
 
 The more relevant point is what sort of errors would be mitigated by having 
 different paths through the crystal. The obvious ones are absorption errors 
 and errors in detector calibration. Inverse beam methods can mitigate these 
 by ensuring the systematic errors are similar for the reflections being 
 compared. However, my interpretation of the Acta D59 paper is that it is 
 accepted that systematic errors are present and, by making multiple 
 measurements under different conditions, the effect of these systematic 
 errors will be minimised.
 
 Can anyone suggest other sources of error which would be mitigated by having 
 different paths through the crystal. I don't think radiation damage 
 (mentioned by several people) is one.
 
 Colin
 
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Frank 
 von Delft
 Sent: 14 May 2013 14:23
 To: ccp4bb
 Subject: [ccp4bb] Fwd: Re: [ccp4bb] reference for true multiplicity?
 
 George points out that the quote I referred to did not make it to the BB -- 
 here we go, read below and learn, it is a most succinct summary.
 phx
 
  Original Message 
 Subject:
 
 Re: [ccp4bb] reference for true multiplicity?
 
 Date:
 
 Tue, 14 May 2013 09:25:22 +0100
 
 From:
 
 Frank von Delft 
 frank.vonde...@sgc.ox.ac.ukmailto:%3cfrank.vonde...@sgc.ox.ac.uk%3e
 
 To:
 
 George Sheldrick 
 gshe...@shelx.uni-ac.gwdg.demailto:gshe...@shelx.uni-ac.gwdg.de
 
 
 Thanks!  It's the Acta D59 p688 I was thinking of - start of discussion:
 The results presented here show that it is possible to solve
 protein structures using the anomalous scattering from native
 S atoms measured on a laboratory instrument in a careful but
 relatively routine manner, provided that a sufficiently high
 real redundancy is obtained (ranging from 16 to 44 in these
 experiments). Real redundancy implies measurement of
 equivalent or identical re�ections with different paths through
 the crystal, not just repeated measurements; this is expedited
 by high crystal symmetry and by the use of a three-circle (or )
 goniometer.
 Wise words...
 
 phx
 
 
 On 14/05/2013 08:06, George Sheldrick wrote:
 Dear Frank,
 
 We did extensive testing of this approach at the beginning of this millenium 
 - see
 Acta Cryst. D59 (2003) 393 and 688 - but never claimed that it was our idea.
 
 Best wishes,
 George
 
 On 05/14/2013 06:50 AM, Frank von Delft wrote:
 
 Hi, I'm meant to know this but I'm blanking, so I'll crowdsource instead:
 
 Anybody know a (the) reference where it was showed that the best SAD data is 
 obtained by collecting multiple revolutions at different crystal offsets 
 (kappa settings)?  It's axiomatic now (I hope!), but I remember seeing 
 someone actually show this.  I thought Sheldrick early tweens, but PubMed is 
 not being useful.
 
 (Oh dear, this will unleash references from the 60s, won't it.)
 
 phx
 
 
 
 
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

[ccp4bb] A crystallographer on Mars

2013-05-06 Thread Ethan Merritt

The _New Yorker_ frequently publishes decently written articles on a
huge variety of topics.  Occasionally they come out with one about
science, sometimes with a focus on a public policy issue, sometimes a
biographical piece about a mainstream or not-so-mainstream scientist,
sometimes a serialized first publication of a book by a scientist
written for a wide audience.  So I was not terribly surprised to find
in the 22 April issue an article about the Mars rover Curiosity and
the team that designed it.  
http://www.newyorker.com/reporting/2013/04/22/130422fa_fact_bilger
Spread across two pages in the center of the issue was a color image
of the Curiosity rover itself.  Just the thing to inspire creative use 
of one's Lego collection
http://www.space.com/17058-mars-rover-curiosity-lego-instructions.html.

But then it got a bit strange.  The caption reads:

  ... the mission includes a nuclear-powered mobile laboratory,
  equipped with lasers, spectrometers, and an X-ray crystallographer.

Wow!  Who's the lucky Mars-going crystallographer?  Anyone we know?

The article text goes to quote one of the JPL rover team members:

  Curiosity came equipped with lasers, spectrometers, and a gas
  chromatograph. It had a radiation detector, an X-ray crystallographer,
  and a complete weather station. [...] It was like a Hummer with a
  half-dozen scientists crammed inside.  

OK, so at least our lucky crystallographic colleague has some company
out there on Mars.  Still, I do wonder exactly what it said in the
job ad they responded to. Anyone know what sort of X-ray source they
packed with them?

Ethan

Re: [ccp4bb] CCP4 Update victim of own success

2013-04-11 Thread Ethan Merritt

On Thursday, April 11, 2013 10:22:59 am Antony Oliver wrote:
 Eugene - that's great. I too run a small suite of Macs (12) and was trying to 
 find a practical way of updating all those machines remotely. The command 
 line version of CCP4um will be very useful. 

Another option for a set of machines in the same network is to install a single 
master copy of ccp4 on one machine exported to the others via NFS, and have all 
the 
machines run it from there.  Then you only need to update one copy.
Works fine for me.

Ethan



 
 Many thanks. 
 
 Tony. 
 
 Sent from my iPhone
 
 On 11 Apr 2013, at 18:19, eugene.krissi...@stfc.ac.uk 
 eugene.krissi...@stfc.ac.uk wrote:
 
  Dear Dale,
  
  From next CCP4 release (due soon), ccp4um will be runnable from command 
  line in automatic, non-graphical mode, fully cronable. I hope that that 
  will give you what you want.  --check-silent is a special option for 
  ccp4i, it only checks for new updates but do not install them.
  
  Best regards,
  
  Eugene
  
  
  On 11 Apr 2013, at 18:10, Dale Tronrud wrote:
  
  FYI
  
   I have a small herd of computers here and find it cumbersome to ssh
  to each and fire up ccp4i just to update the systems.  ccp4i takes a
  while to draw all those boxes (particularly over ssh) and leaves files
  behind in my disk areas on computers that I'm not likely to, personally,
  run crystallographic computations.  I much prefer to simply run ccp4um
  from the command line.
  
   In fact, I would rather put it in cron and forget about it -- and
  I expect that is what --check-silent is for.  The usage statement,
  however, doesn't explicitly say that this installs the new updates it
  finds.  I'll have to experiment a bit.
  
  Dale Tronrud
  
  On 04/11/2013 05:17 AM, eugene.krissi...@stfc.ac.uk wrote:
  Sorry that this was unclear. We assume that updater is used primarily 
  from ccp4i, where nothing changed (and why it should be used from command 
  line at all ?:)). The name was changed because it is reserved in Windows, 
  which caused lots of troubles. Now it will stay as is.
  
  Eugene
  
  On 11 Apr 2013, at 05:16, James Stroud wrote:
  
  
  On Apr 10, 2013, at 9:30 PM, 
  eugene.krissi...@stfc.ac.ukmailto:eugene.krissi...@stfc.ac.uk 
  eugene.krissi...@stfc.ac.ukmailto:eugene.krissi...@stfc.ac.uk wrote:
  
  No, it got renamed to ccp4um :) That should have been written in update 
  descriptions, was it not?
  
  
  There was only one mention of ccp4um that I could find in all update 
  descriptions that I found (6.3.0-020). I only figured out what 
  information was trying to be communicated because of your message (see 
  attachment).
  
  James
  
  
  um-what.png
  
  
  
  On 11 Apr 2013, at 03:54, James Stroud wrote:
  
  Hello All,
  
  I downloaded a crispy new version of CCP4 and ran update until the update 
  update script disappeared. Is the reason that CCP4 has reached its final 
  update?
  
  James
  
  
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] CCP4 Update victim of own success

2013-04-11 Thread Ethan Merritt

On Thursday, April 11, 2013 01:53:16 pm David Schuller wrote:
 On 04/11/13 13:36, Ethan Merritt wrote:
  On Thursday, April 11, 2013 10:22:59 am Antony Oliver wrote:
  Eugene - that's great. I too run a small suite of Macs (12) and was trying 
  to find a practical way of updating all those machines remotely. The 
  command line version of CCP4um will be very useful.
  Another option for a set of machines in the same network is to install a 
  single
  master copy of ccp4 on one machine exported to the others via NFS, and have 
  all the
  machines run it from there.  Then you only need to update one copy.
  Works fine for me.
 
  Ethan
 
 My method is to run the updater graphically on one machine, then spread 
 it around with rsync. Although being able to run it on the command line 
 would allow me to accomplish that from my own desk, without crossing 
 campus to another building. Even with gigabit, running X remotely is 
 rather slow and bothersome.

You may misunderstand - the executables live on a shared NFS directory
but there is no remote X connection involved.

Having said that, I routinely connect to the lab machines from home via ssh.
In that case the X connection is remote, but I find that the performance of
the ccp4i GUI is adequate even across the WAN.

Ethan  



 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Query regarding the use of anisotropic temperature factor and ideal rmsAngle and rmsBond length values

2013-03-17 Thread Ethan Merritt

On Sunday, 17 March 2013, Pavel Afonine wrote:
 Hi Sonali,
 
 regarding isotropic vs anisotropic parameterization of your individual
 ADPs: apart from common sense and theoretical considerations, this is also
 in great part software dependent.
 
 I can't speak for other programs, but for phenix.refine I would say the
 rule of thumb is:
 - higher than 1.5A: refine macromolecule with individual anisotropic ADPs
 (the rest - isotropic);

I would place the expected resolution break-even point at more like
1.2 - 1.3 A.  But that's only an expectation, not a rule to rely on.
You should justify anisotropic refinement of a structure on the basis
of its own particular model and measured data.  Robbie Joosten has
already pointed out that you can use the PDB-Redo scripts to test
whether individual anisotropic ADPs are justified.

 - higher than 1.2A: all anisotropic (macromolelcule, water, ligands)
 - lower than 1.7A: all isotropic;
 - 1.5-1.7A is a grey area where there is only one single way to know for
 sure: try both (isotropic and anisotropic) and see which one works best.
 I realize works best is a broad term, but I would say Rwork, Rfree,
 Rfree-Rwork and values of refined anisotropic ADPs should be enough to make
 a decision.

Unfortunately, Rfree cannot be used reliably for this purpose.  

Please see my To B or not to B Acta D paper from last year
for worked examples of why Rfree cannot be trusted in this case.
In particular, in the 1.5 - 1.7A region Rfree is likely to
indicate incorrectly that anisotropic refinement is OK, whereas
peeking at the answer sheet (i.e. using a known structure where
true atomic resolution data is available) demonstrates that the
anisotropic refinement is garbage even though Rfree is improved.

cheers,

Ethan

 If it's not a neutron data, H atoms should be always isotropic (kind of
 obvious, but mentioning it just in case..).
 
 Good luck,
 Pavel
 
 On Sun, Mar 17, 2013 at 1:06 AM, sonali dhindwal 
 sonali11dhind...@yahoo.co.in wrote:
 
  Dear All,
 
  We want little suggestion and knowledge regarding refinement of data in
  Refmac. We have a data with resolution upto 1.5A. Overall redundancy of 5.5
  and 3.7 in high resolution bin. and I over Sigma is also 21 overall and
  2.2 in last resolution bin.
 
  When we first did isotropic refinement we used automatic weighing term,
  which gave good Rfree and Rfactor of 18.4 and 16.9 but high rmsBond and
  rmsAngle of 0.027 and 2.5 respectively. We were able to improve rmsBond and
  rmsAngle values by decreasing weighing term to 0.5.
 
  But when we do anisotropic refinement with weighing term of 0.5 it gives
  Rfree, Rfactor and FOM of 16.8, 15.0 and 90.7 respectively. And rmsAngle
  and rmsBond of 0.0074 and 1.25.
 
  Now, we want to know what should be the ideal values for rmsAngle and
  rmsBond at such resolution. Secondly, if we can use anisotropic refinement
  with such data.
 
  All your suggestions will be highly valuable.
  Thanks in advance.
 
  --
  Sonali Dhindwal
 
  “Live as if you were to die tomorrow. Learn as if you were to live
  forever.”

Re: [ccp4bb] Query regarding the use of anisotropic temperature factor and ideal rmsAngle and rmsBond length values

2013-03-17 Thread Ethan Merritt

On Sunday, 17 March 2013, Pavel Afonine wrote:
 Hi Ethan,
 
 
 I would place the expected resolution break-even point at more like
  1.2 - 1.3 A.  But that's only an expectation, not a rule to rely on.
  You should justify anisotropic refinement of a structure on the basis
  of its own particular model and measured data.  Robbie Joosten has
  already pointed out that you can use the PDB-Redo scripts to test
  whether individual anisotropic ADPs are justified.
 
 
 when it comes to a point when a choice of refinement strategy cannot be
 uniquely and reliably chosen based on theoretical considerations, that
 opens a great opportunity for endless perennial discussions like this. My
 point was that if you are lucky and there isn't many options (in this case
 there are only two: iso vs aniso!) there is an easier, quicker and robust
 alternative: simply try both and that will give your THE answer. Perhaps
 not very scientific (no monster formula derived) but quick, easy and robust!
 
 
   - higher than 1.2A: all anisotropic (macromolelcule, water, ligands)
   - lower than 1.7A: all isotropic;
   - 1.5-1.7A is a grey area where there is only one single way to know for
   sure: try both (isotropic and anisotropic) and see which one works best.
   I realize works best is a broad term, but I would say Rwork, Rfree,
   Rfree-Rwork and values of refined anisotropic ADPs should be enough to
  make
   a decision.
 
  Unfortunately, Rfree cannot be used reliably for this purpose.
 
 
 Yes, Rfree cannot be used for this purpose reliably, very true. This is
 exactly why I wrote above .. Rwork, Rfree, Rfree-Rwork *and values of
 refined anisotropic ADPs* should be enough to make a decision. My
 understanding is that your To B.. method is one of possible ways of
 looking at the values of refined anisotropic B-factors.

That is a misunderstanding that misses the fundamental point.
Looking at the refined ADPs only helps in the artificial case that
you have a known-correct set of ADPs as a point of comparison for the 
newly refined ADPS [*].

You cannot do that in the case of a new structure.
  
The Hamilton R test does not look at the individual ADPs or at any other
refined parameter values.  It looks only at the crystallographic residuals
and the respective degrees of freedom.  In this sense it is analagous
to comparing Rfree values, but as demonstrated in the paper the Hamilton
test can correctly detect an invalid model in cases where Rfree cannot.

Ethan


[*] Of course if the new ADPs are inherently unreasonable, for instance
non-positive definite, that is good reason to reject the model.
But I am assuming that a properly behaving refinement program will not
produce such an inherently unbelievable model.  Instead the outcome
of refinement will be internally consistent, plausible, but wrong.
You won't learn that just by inspecting the ADPs it produces.


 All the best,
 Pavel

Re: [ccp4bb] first use of synchrotron radiation in PX

2013-03-16 Thread Ethan Merritt

On Saturday, 16 March 2013, James Holton wrote:
 The first report of shooting a protein crystal at a synchrotron (I 
 think) was in 1976:
 http://www.pnas.org/content/73/1/128.full.pdf
 that was rubredoxin
 
 The first PDB file that contains a SYNCHROTRON=Y entry is 1tld 
 (trypsin), which was deposited in 1989:
 http://dx.doi.org/10.1016/0022-2836(89)90110-1
 But the structure of trypsin was arguably already solved at that time.
 
 Anomalous diffraction was first demonstrated by Coster, Knoll and Prins 
 in 1930
 http://dx.doi.org/10.1007/BF01339610
 this was 20 years before Bijvoet.  But not with a synchrotron and 
 definitely not with a protein
 
 The first protein to be solved using anomalous was crambin in 1981:
 http://dx.doi.org/10.1038/290107a0
 but this was not using a synchrotron
 
 The first demonstration of MAD on a protein at a synchrotron was a Tb 
 soak of parvalbumin in 1985
 http://dx.doi.org/10.1016/0014-5793(85)80207-6
 but one could argue that several parvalbumins were already known at that 
 time.
 
 The first MAD structure from native metals was cucumber blue copper 
 protein (2cbp) in 1989
 http://dx.doi.org/10.1126%2Fscience.3406739

The original CBP MAD structure (1CBP) was published in 1988.

Also 1988:
  Lamprey hemoglobin (Fe MAD) DOI: 10.1002/prot.340040202

1989:
  Streptavidin (Se MAD): PNAS 1989 86 (7) 2190-2194

 The first new structure using MAD, as well as the first SeMet was 
 ribonuclease H (1rnh) in 1990
 http://dx.doi.org/10.1126/science.2169648
 
 If anyone knows of earlier cases, I'd like to hear about it!

Ethan

 
 -James Holton
 MAD Scientist
 
 On 3/13/2013 7:38 AM, Alan Cheung wrote:
  Hi all - i'm sure this many will know this : when and what was the 
  first protein structure solved on a synchrotron?
 
  Thanks in advance
  Alan

Re: [ccp4bb] Which program sequence to use in transforming from P1 to orthorhombic?

2013-02-12 Thread Ethan Merritt

On Tuesday, February 12, 2013 12:39:57 am Phil wrote:
 Scale constant in Aimless or Scala should do it. I should probably make 
 that automatic. 

scale constant did indeed persuade aimless/scala to run.
However, what seems to have happened is that aimless/scala expanded the original
[I, SIGI] into [I+, SIGI+] [I-, SIGI-], but all the [I-, SIGI-]  entries were
filled in as zero.  When ctruncate runs, it segfaults on a divide by zero error.
If I filter out the +/- columns and run ctruncate again, all is well.
So aside from anything else, I think ctruncate needs some sanity checks for
all-zero columns.

Ethan


 
 I should probably also add a CIF reader to Pointless. Is there a good (easy) 
 C++ one out there?
 
 Phil 
 
 Sent from my iPad
 
 On 12 Feb 2013, at 08:08, Jens Kaiser kai...@caltech.edu wrote:
 
  Ethan,
   The last time I attempted similar things, I had to run rotaprep to
  convince scala of using most things that did not come directly out of
  mosflm, but that was before the pointless days. 
   As the reflections are already scaled in P1, I would consider it safe
  to rely on the Pointless Rmerge -- but that's just a guess (and you
  can't do much with the data downstream). I would assume sftools might be
  able to merge the reindexed file output by pointless.
Nevertheless, if I were faced with the same problem nowadays, I would
  convert to a shelx hkl file and use xprep for the merging and statistics
  -- that's painless.
  
  Cheers,
  
  Jens
  
  On Mon, 2013-02-11 at 13:56 -0800, Ethan Merritt wrote:
  Hi all,
  
  I've downloaded a structure factor file from the PDB that presents
  itself as being triclinic.  It contains F, sig(F), and Rfree only.
  The P1-ness of this structure is dubious, however.
  
  Pointless is 99.6% sure it's orthorhombic and puts out an mtz file
  in P212121 containing 
 I SIGI BATCH M/ISYM
  
  where the batch numbers are all 1 and ISYM runs from 1 to 8.
  So far so good, but now I'm stuck.  I can't persuade Scala
  or Aimless to merge the symmetry mates and report a merging
  R factor.Is there a trick to this?  Some other program sequence?
  
 Ethan
  
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Which program sequence to use in transforming from P1 to orthorhombic?

2013-02-12 Thread Ethan Merritt

On Tuesday, February 12, 2013 09:52:19 am Phil wrote:
 Ah I'm not sure about that. It may be possible to tell ctruncate not to do 
 this. 
 Actually if you started with Fs you don't want to truncate the data. 

Pointless changes the Fs to Is, so you need to get back to Fs somehow.

Ethan


 Maybe use old truncate with the notruncate option
 Phil
 
 Sent from my iPad
 
 On 12 Feb 2013, at 18:48, Ethan Merritt merr...@u.washington.edu wrote:
 
  On Tuesday, February 12, 2013 12:39:57 am Phil wrote:
  Scale constant in Aimless or Scala should do it. I should probably make 
  that automatic.
  
  scale constant did indeed persuade aimless/scala to run.
  However, what seems to have happened is that aimless/scala expanded the 
  original
  [I, SIGI] into [I+, SIGI+] [I-, SIGI-], but all the [I-, SIGI-]  entries 
  were
  filled in as zero.  When ctruncate runs, it segfaults on a divide by zero 
  error.
  If I filter out the +/- columns and run ctruncate again, all is well.
  So aside from anything else, I think ctruncate needs some sanity checks for
  all-zero columns.
  
 Ethan
  
  
  
  I should probably also add a CIF reader to Pointless. Is there a good 
  (easy) C++ one out there?
  
  Phil 
  
  Sent from my iPad
  
  On 12 Feb 2013, at 08:08, Jens Kaiser kai...@caltech.edu wrote:
  
  Ethan,
  The last time I attempted similar things, I had to run rotaprep to
  convince scala of using most things that did not come directly out of
  mosflm, but that was before the pointless days. 
  As the reflections are already scaled in P1, I would consider it safe
  to rely on the Pointless Rmerge -- but that's just a guess (and you
  can't do much with the data downstream). I would assume sftools might be
  able to merge the reindexed file output by pointless.
   Nevertheless, if I were faced with the same problem nowadays, I would
  convert to a shelx hkl file and use xprep for the merging and statistics
  -- that's painless.
  
  Cheers,
  
  Jens
  
  On Mon, 2013-02-11 at 13:56 -0800, Ethan Merritt wrote:
  Hi all,
  
  I've downloaded a structure factor file from the PDB that presents
  itself as being triclinic.  It contains F, sig(F), and Rfree only.
  The P1-ness of this structure is dubious, however.
  
  Pointless is 99.6% sure it's orthorhombic and puts out an mtz file
  in P212121 containing 
I SIGI BATCH M/ISYM
  
  where the batch numbers are all 1 and ISYM runs from 1 to 8.
  So far so good, but now I'm stuck.  I can't persuade Scala
  or Aimless to merge the symmetry mates and report a merging
  R factor.Is there a trick to this?  Some other program sequence?
  
Ethan
  
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

[ccp4bb] Which program sequence to use in transforming from P1 to orthorhombic?

2013-02-11 Thread Ethan Merritt

Hi all,

I've downloaded a structure factor file from the PDB that presents
itself as being triclinic.  It contains F, sig(F), and Rfree only.
The P1-ness of this structure is dubious, however.

Pointless is 99.6% sure it's orthorhombic and puts out an mtz file
in P212121 containing 
I SIGI BATCH M/ISYM

where the batch numbers are all 1 and ISYM runs from 1 to 8.
So far so good, but now I'm stuck.  I can't persuade Scala
or Aimless to merge the symmetry mates and report a merging
R factor.Is there a trick to this?  Some other program sequence?

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] B-factors

2013-01-24 Thread Ethan Merritt

On Thursday, January 24, 2013 03:52:12 pm Urmi Dhagat wrote:
 Hi all, 
 
 I have been refining twinned data (at 3.1 A resolution) using refmac. My R 
 and Rfree values are 19.6 and 26.2 respectively with NCS restraints and 
 isotropic B-factor refinement.. I am not sure weather it is a good idea to 
 refine individual B-factors at this resolution. 
 
 I have also tried refining the same model in phenix but this time not 
 refining the Bfactors. My Rfactor and Rfree are 25 and 32 respectively. 
 Refining with TLS in Phenix drops R factors to 23 and 29.

I would suspect it is possible to do better than that.

My thoughts on how to approach it were written up for a past CCP4 Study Weekend
and appeared in Acta D last year:

To B or not to B  Acta D 68:468 (2012).

You can find a link to the PDF on the TLSMD web site
http://skuld.bmsc.washington.edu/~tlsmd/references.html

Ethan


 Then I used the output PDB from phenix and refined it in CCP4 (selecting 
 overall B-factor refinement option instead of Isotropic) and my R factors are 
 R work=16 and Rfree =21.
 
 If Rfree reflections are refined my refmac upon switching from phenix to 
 refmac then does this contaminate the Rfree set ? Should swiching between 
 refinement programs Phenix and Refmac be avoided?
 
 
 Urmi Dhagat
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] CASP ROLL needs your structures!

2013-01-19 Thread Ethan Merritt

On Saturday, 19 January 2013, Luecke, Hartmut wrote:
 
 In recent CASPs, there has been a shortage of new folds (according to PDB
 exactly zero new folds deposited since 2009) and membrane protein targets.

I have been wondering about that. It is true that the PDB has not listed any
new folds since 2009, but that hasn't stopped people from publishing new
structures and claiming they are new folds.   Is this because there is no
single recognized criterion for new in these cases?  
Or possibly the PDB hasn't updated their statistics since 2009?

E.g. from the 1st page of Google Scholar hits for protein structure new fold

2RSXR Arai, S Fukui, N Kobayashi, J Sekiguchi - JBC 2012
Solution Structure of IseA, an Inhibitor Protein of 
dl-Endopeptidases from Bacillus subtilis, Reveals a Novel Fold 

3RX6R Banerjee, S Nath, A Ranjan, S Khamrui, B Pani, R Sen, U Sen - 
JBC 2012
A search of the Protein Data Bank using the DALI server (19) 
and PDBeFold (20) did not produce any significant match with 
the Psu structure designating it to be a new fold

2L8K Ioannis Manolaridis, ..., Eric J. Snijder - J. Virology 2011
Structure and genetic analysis of the arterivirus nonstructural 
protein 7α

Ethan Merritt


 The lack of such targets makes it problematic to reliably quantify the state
 of the art in the area of protein structure prediction. To remedy this
 situation, CASP organizers have recently launched a new project called CASP
 ROLL (http://predictioncenter.org/casprol), where amino acids sequences of
 challenging targets are released throughout the year when structure solution
 is imminent. CASP specifically needs sequences of low-homology membrane
 targets that are about to be solved or have been solved but not released by
 PDB or elsewhere yet. It is important that structural information about the
 targets has not been publicly exposed (including things like coordinates,
 images, papers, conference abstracts) until after the prediction window for
 a given target has been closed.
 
 Each target will be available for prediction for a period of three to four
 weeks; in some cases a longer hold (up to 8 weeks) may be requested to
 allow the same target to be re-used for additional modeling experiments.
 
 So if you have anything suitable - please let CASP know. As you saw from the
 information above, your targets need not be fully refined structures. And if
 you need to make public a target's structure before the CASP window closes,
 simply contact CASP.  We would rather lose a few targets than not have any
 at all!
 
 A good perspective for solving the structure in a few months is a good
 enough assurance for CASP.  The submission mechanism is really simple. You
 can submit a target using the CASP Target Submission Form
 (http://predictioncenter.org/casprol/targets_submission.cgi), by sending
 email to c...@predictioncenter.org, or by marking your PDB deposition as
 CASP target in PDB's ADIT system (this way PDB will automatically put your
 target on hold for CASP for 8 weeks).
 
 Submission details can be found at
 http://predictioncenter.org/casprol/targets_submission.cgi
 
 Thanks and hoping for lots of targets.
 
 
 Hudel, UC Irvine
 
 
 
 
 This message contains confidential information and is intended only for the 
 individual named. If you are not the named addressee you should not 
 disseminate, distribute or copy this e-mail. Please notify the sender 
 immediately by e-mail if you have received this e-mail by mistake and delete 
 this e-mail from your system. E-mail transmission cannot be guaranteed to be 
 secure or error-free as information could be intercepted, corrupted, lost, 
 destroyed, arrive late or incomplete, or contain viruses. The sender 
 therefore does not accept liability for any errors or omissions in the 
 contents of this message, which arise as a result of e-mail transmission.

Re: [ccp4bb] About NCS and inhibitors

2013-01-07 Thread Ethan Merritt

On Monday, January 07, 2013 12:10:17 pm Edward A. Berry wrote:
 The idea is (whether it's valid or not) to apply the information from
 both sites simultaneously. If the density is pretty ambiguous and one side
 tends to drift off into an alternate conformation and the other drifts off
 into another conformation, but you have every reason to believe that
 the conformation is the same on both sides, applying NCS allows you to
 refine a single conformation that is consistent with both.

Ah, but what if you don't have every reason to believe
that the conformation is the same on both sides?

 More generally, is there any reason to not? I suppose ligands
 may be more likely than amino acids to violate NCS, but good
 practices would say examine each residue for violations.
 
 You could say, why enforce NCS on the 27'th residue of each chain, since
 their contribution to the number of parameters is small.
 (mind you,there may be good reasons to not constrain ligands
 that I m not aware of, if so I hope someone speak up)

The example mentioned earlier was HIV protease (a homodimer) in complex
with an asymmetric inhitor. I.e. a single asymmetric ligand occupying a 
single site formed by 2 NCS-related chains.  Depending on what treatment
of NCS is available, it may or may not be reasonable to apply NCS
restraints to the protein.  Local geometry restraints may be
reasonable but coordinate restraints probably not.  

In practice you would normally expect the ligand to be stochastically 
present in one of 2 orientations, corresponding to the 2-fold NCS.
But if the ligand is not totally enclosed by the protein, the two
orientations may not be equally present and indeed might conceivably
not follow the overall 2-fold NCS of the protein.

To bring in an example that I've worked on myself, the cholera toxin
B-pentamer has 5 identical receptor binding sites related by 5-fold
NCS.  Small-molecule receptor analogs bound at those sites can extend well
beyond the protein envelope; their precise binding conformation is
affected by lattice contacts that obviously don't follow 5-fold NCS.
This is a case where applying the protein NCS to the ligands
would not make sense.


Ethan



 
 eab
 
 Boaz Shaanan wrote:
  Just a naive question: why apply NCS to ligands at all? Their contribution 
  to the number of parameters and hence to the param/obs ratio, the main 
  argument for applying NCS, is negligible, isn't it?
 
Boaz
 
  Boaz Shaanan, Ph.D.
  Dept. of Life Sciences
  Ben-Gurion University of the Negev
  Beer-Sheva 84105
  Israel
 
  E-mail: bshaa...@bgu.ac.il
  Phone: 972-8-647-2220  Skype: boaz.shaanan
  Fax:   972-8-647-2992 or 972-8-646-1710
 
 
 
 
 
  
  From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] on behalf of Edward A. 
  Berry [ber...@upstate.edu]
  Sent: Monday, January 07, 2013 6:12 PM
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: Re: [ccp4bb] About NCS and inhibitors
 
  But I think the original poster meant partially overlapped after applying
  the ncs- operator- i.e. they are not ncs related but occupy partly the same
  position (in the two non-overlapping copies of the binding site).
 
  Then I guess it depends how clear the density is- If the density is not very
  clear and if the protein residues of the active site do follow ncs, I would 
  try
  rebuilding ligand b to match a and (separately) a to match b and refining
  with ncs applied to the ligand; and see if the resulting fit looks just as 
  good.
  eab
 
  Joel Sussman wrote:
  Dear All,
  Something like what Felix wrote is seen in the crystal structure of 
  *recombinant human acetylcholinesterase* (*rhAChE*)
  (PDB-ID: *3lii*), with two molecules are seen in the asymmetric unit.
  * In one molecule, the active-site gorge (where inhibitors normally lie) 
  is occupied with part of a peptide loop from a
  symmetrically related rhAChE.
  * While the corresponding region of the other copy of rhAChE is void of 
  this peptide.
  See figs 15-16 in:
  Dvir, H., Silman, I., Harel, M., Rosenberry, T. L.  Sussman, J. L. 
  (2010). Acetylcholinesterase: From 3D structure to
  function /Chemico-Biological Interactions/ *187*, 10-22.
  * So, in essence, no reason to ever assume that two copies in asymmetric 
  unit will be identical, or have identical
  inhibitors bound, or 'surrogate inhibitors' (like in this case) bound. 
  Sometimes differences are due to difference in
  crystal packing
  Best regards,
  Joel
 
 
  On 7 Jan 2013, at 11:58, Felix Frolow wrote:
 
  I apologise for typing blinbly:
   if one is in, the second can't be
  FF
  Dr Felix Frolow
  Professor of Structural Biology and Biotechnology, Department of 
  Molecular Microbiology and Biotechnology
  Tel Aviv University 69978, Israel
 
  Acta Crystallographica F, co-editor
 
  e-mail: mbfro...@post.tau.ac.ilmailto:mbfro...@post.tau.ac.il
  Tel: ++972-3640-8723
  Fax: ++972-3640-9407
  Cellular: 0547 459 608
 
  On Jan 7, 2013, at 11:48 , Felix

Re: [ccp4bb] vitrification vs freezing

2012-11-15 Thread Ethan Merritt

On Thursday, November 15, 2012 09:13:58 am you wrote:
 
 Hi folks,
 I have recently received a comment on a paper, in which referee #1 (excellent 
 referee, btw!) commented like this:
 
 crystals were vitrified rather than frozen.
 
 These were crystals grew in ca. 2.5 M sodium malonate, directly dip in liquid 
 nitrogen prior to data collection at 100 K.
 We stated in the methods section that crystals were frozen in liquid 
 nitrogen, as I always did.
 
 After a little googling it looks like I've always been wrong, and what we are 
 always doing is doing is actually vitrifying the crystals.
 Should I always use this statement, from now on, or are there english/physics 
 subtleties that I'm not grasping?

What we aim for is vitrification: to make into a glass.
What we achieve is another matter.
Sometimes dipping into LN2 produces a partially ordered (non-glasslike)
state in the solvent that is bad for our diffraction experiment.

Either result, the desired glass or the unfortunately crystalline ice,
is an example of freezing: to make into a solid by removing heat.

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] vitrification vs freezing

2012-11-15 Thread Ethan Merritt

On Thursday, November 15, 2012 10:14:54 am Raji Edayathumangalam wrote:
 Hi Sebastiano,
 
 Elspeth Garman howls bloody murder everytime someone says they froze
 their crystals. I think her issue is with the description of the process of
 successfully flashcooling crystals in the presence of cryoprotectants as
 freezing. Freezing technically is understood to imply the formation of
 hexagonal ice 

Not according to common English usage or any of the dictionaries I
looked in.  
E.g. American Heritage Dictionary:
  Freeze 1.a. To pass from the liquid to the solid state by loss of heat.

It needn't refer to water at all, although that is the most common context.
You can find instructions for freezing olive oil to preserve it;  
when I lived in Madison one occassionally had to worry about frozen 
engine oil;  a headline from earlier this year claimed 
Russian rivers clogged with frozen oil.

 while what one really means is the successful solidification
 of water in a random orientation (vitrification) and the prevention of the
 hexagonal ice.
 
 Semantics semantics!
 
 I'd stick with flashcooled or something along those lines.
 Raji

Funny you should say that :-)
While I have never had a referred complain about frozen crystals,
I have had several complain that flash cooling is different from
immersing in liquid nitrogen.  I never figured out what they had
in mind, but have since tried to avoid the term flash cooling.

By the way, cryo-cooled must be a term advocated by 
The Department of Redundancy Department.
cryo - From Greek kruos, icy cold

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] side chain density

2012-11-09 Thread Ethan Merritt

On Friday, 09 November 2012, Faisal Tarique wrote:
 Dear all
 
 i have solved a structure ( at 2A resolution) whose Rwork and Rfree is 22
 and 25 respectively..the Ramachandran plot shows 90% of the residues in the
 most favorable region and with 6 residues in generously allowed and no
 residues in disallowed region. But in some areas i can see density missing
 for side chains ( in loop regions )..i have question do i need to mutate
 them to alanine or leave them as such.

Mutating to alanine is not an option.
They are not alanine.
If nothing else, when you get to the point of depositing your
structure in the PDB it will fail validation checks because
the sequence is not correct at those points.

But if you mean should you delete sidechain atoms beyond
CB, that is another question.  That is a legitimate option.
I suggest trying that and then looking in difference density
maps to see if any density shows up to guide placement of
the sidechain.

Ethan

 .The density fit analysis in COOT (
 traffic light) showing those regions with side chain as red..
 
 thanx in advance
 
 Regards
 
 Faisal
 School of Life Sciences
 JNU

Re: [ccp4bb] Ca or Zn

2012-10-30 Thread Ethan Merritt

On Tuesday, October 30, 2012 01:44:43 pm Adrian Goldman wrote:

 The coordination is indicative but not conclusive but, as I responded to the 
 original poster, I think the best approach is to use anomalous scattering.  
 You can measure just below and above the Ca edge, 

Actually, you can't.  The Ca K-edge is at 3.07Å, which is not a wavelength
amenable to macromolecular data collection.  

cheers,

Ethan


 and similarly with the Zn, and those maps will be _highly_ indicative of the 
 relative amounts of metal ion present.  In fact, you can deconvolute so that 
 you know the occupancy of the metals at the various sites.
 
 Adrian
 
 
 On 30 Oct 2012, at 22:37, Chittaranjan Das wrote:
 
  Veerendra,
  
  You can rule out if zinc has replaced calcium ions (although I agree with 
  Nat and others that looking at the coordination sphere should give a big 
  clue) by taking a few crystals, washing them a couple of times and 
  subjecting them to ICP-MS analysis, if you have access to this technique. 
  You can learn how many zinc, if any, have bound per one protein molecule in 
  the dissolved crystal.
  
  Best
  Chitta
  
  
  
  - Original Message -
  From: Veerendra Kumar veerendra.ku...@uconn.edu
  To: CCP4BB@JISCMAIL.AC.UK
  Sent: Tuesday, October 30, 2012 2:55:33 PM
  Subject: [ccp4bb] Ca or Zn
  
  Dear CCP4bb users,
  
  I am working on a Ca2+ binding protein. it has 4-5 ca2+ binding sites.  I 
  purified the protein  in presence of Ca2+ and crystallized the Ca2+ bound 
  protein. I got crystal and solved the structure by SAD phasing at 2.1A 
  resolution. I can see the clear density in the difference map for metals at 
  the expected binding sites. However I had ZnCl2 in the crystallization 
  conditions. Now i am not sure whether the observed density is for Ca or Zn 
  or how many of them are ca or  zn? Since Ca (20 elctron) and Zn (30 
  electron), is this value difference can be used to make a guess about 
  different ions? 
  is there any way we can find the electron density value at different peaks? 
  
  Thank you
  
  Veerendra 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Ca or Zn

2012-10-30 Thread Ethan Merritt

On Tuesday, 30 October 2012, Jrh wrote:
 This paper describes use of data either side of the calcium edge:-
 
 http://dx.doi.org/10.1107/S0907444905002556

I think that counts as not amenable (which is not quite the same
as impossible.  From the Methods section of that paper:

  Measurements in the vicinity of the K absorption edge of
  calcium (3.07 Å) are close to or beyond the physical limit
  of most beamlines typically used for X-ray crystallography
  [...] It was not possible to observe interpretable
  diffraction patterns at λ = 3 Å with the weakly diffracting
  furin crystals using the MAR CCD detector and exposure
  times up to 20 min per degree of rotation.

They did soldier on and managed to collect extremely weak data
below the Ca edge and stronger but still very weak data above the
edge where the Ca f term was appreciable.  But this is far from a
routine experiment.

Another approach dating back to work in 1972 by Peter Coleman
and Brian Matthews http://dx.doi.org/10.1016/0006-291X(72)90750-4
is to replace the Ca with a rare earth having similar chemistry 
(e.g. La, whose L-1 edge is at 1.98Å).  


 This next paper describes a case of gallium and zinc mix at 
 one site with occupancy AND sigmas estimated with different software. 
 This example is however much better diffraction resolution than 
 that you may have. But hopefully will still be of interest:-
 http://dx.doi.org/10.1107/S0108768110011237

Ga and Zn, sure.  That's an easy one. 
The Ga edge is at 1.196Å and the Zn edge is at 1.284Å,
both edges are nicely in range for data collection and they are
close enough together that little or no beamline readjustment
is needed when jumping from one to the other.

Ethan



 
 Prof John R Helliwell DSc
  
  
 
 On 31 Oct 2012, at 04:53, Ethan Merritt merr...@u.washington.edu wrote:
 
  On Tuesday, October 30, 2012 01:44:43 pm Adrian Goldman wrote:
  
  The coordination is indicative but not conclusive but, as I responded to 
  the original poster, I think the best approach is to use anomalous 
  scattering.  You can measure just below and above the Ca edge, 
  
  Actually, you can't.  The Ca K-edge is at 3.07Å, which is not a wavelength
  amenable to macromolecular data collection.  
  
 cheers,
  
 Ethan
  
  
  and similarly with the Zn, and those maps will be _highly_ indicative of 
  the relative amounts of metal ion present.  In fact, you can deconvolute 
  so that you know the occupancy of the metals at the various sites.
  
  Adrian
  
  
  On 30 Oct 2012, at 22:37, Chittaranjan Das wrote:
  
  Veerendra,
  
  You can rule out if zinc has replaced calcium ions (although I agree with 
  Nat and others that looking at the coordination sphere should give a big 
  clue) by taking a few crystals, washing them a couple of times and 
  subjecting them to ICP-MS analysis, if you have access to this technique. 
  You can learn how many zinc, if any, have bound per one protein molecule 
  in the dissolved crystal.
  
  Best
  Chitta
  
  
  
  - Original Message -
  From: Veerendra Kumar veerendra.ku...@uconn.edu
  To: CCP4BB@JISCMAIL.AC.UK
  Sent: Tuesday, October 30, 2012 2:55:33 PM
  Subject: [ccp4bb] Ca or Zn
  
  Dear CCP4bb users,
  
  I am working on a Ca2+ binding protein. it has 4-5 ca2+ binding sites.  I 
  purified the protein  in presence of Ca2+ and crystallized the Ca2+ bound 
  protein. I got crystal and solved the structure by SAD phasing at 2.1A 
  resolution. I can see the clear density in the difference map for metals 
  at the expected binding sites. However I had ZnCl2 in the crystallization 
  conditions. Now i am not sure whether the observed density is for Ca or 
  Zn or how many of them are ca or  zn? Since Ca (20 elctron) and Zn (30 
  electron), is this value difference can be used to make a guess about 
  different ions? 
  is there any way we can find the electron density value at different 
  peaks? 
  
  Thank you
  
  Veerendra

Re: [ccp4bb] PNAS on fraud

2012-10-19 Thread Ethan Merritt

On Friday, October 19, 2012 10:12:44 am Colin Nave wrote:
 This is worth looking at as well. Suggests most papers should be retracted!
 http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124

  A paper claiming that all papers are false, by someone named Ioannidis?
  I wonder if he is from Crete :-)

E for channeling Epimenides Merritt
-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] PNAS on fraud

2012-10-18 Thread Ethan Merritt

On Thursday, October 18, 2012 10:52:48 am DUMAS Philippe (UDS) wrote:
 
 Le Jeudi 18 Octobre 2012 19:16 CEST, Bernhard Rupp (Hofkristallrat a.D.) 
 hofkristall...@gmail.com a écrit: 
 
 I had a look to this PNAS paper by Fang et al.
 I am a bit surprised by their interpretation of their Fig. 3: 
 they claim that here exists a highly signficant correlation between 
 Impact factor and number of retractations. 
 Personnaly,  I would have concluded to a complete lack of correlation...
 Should I retract this judgment?

Fang et al. claim that R^2 = 0.0866, which means that CC = 0.29.
While a correlation coefficient of less than 0.3 is not
a complete lack of correlation, it's still rather weak.

The highly significant must be taken in a purely statistical sense.
That is, it doesn't mean the measures are highly correlated, it
means the evidence for non-zero correlation is very strong.

Ethan


 Philippe Dumas
  
  Dear CCP4 followers,
  
  Maybe you are already aware of this interesting study in PNAS regarding the
  prevalence of fraud vs. 'real' error in paper retractions:
  
  Fang FC, Steen RG and Casadevall A (2012) Misconduct accounts for the
  majority of retracted scientific publications. Proc Natl Acad Sci U S A
  109(42): 17028-33.
  
  http://www.pnas.org/content/109/42/17028.abstract
  
  There were also a few comments on related stuff such as fake peer review in
  the Chronicle of Higher Education. As not all may
  have access to that journal, I have put the 3 relevant pdf links on my web 
  
  http://www.ruppweb.org/CHE_Misconduct_PNAS_Stuft_Oct_2012.pdf
  http://www.ruppweb.org/CHE_DYI_reviews_Sept_30_2012.pdf
  http://www.ruppweb.org/CHE_The-Great-Pretender_Oct_8_2012.pdf
  
  
  Best regards, BR
  -
  Bernhard Rupp
  001 (925) 209-7429
  +43 (676) 571-0536
  b...@ruppweb.org
  hofkristall...@gmail.com
  http://www.ruppweb.org/
  -
  
  
  
  
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] anisotropic refinement

2012-10-11 Thread Ethan Merritt

On Thursday, October 11, 2012 11:50:37 am Rex Palmer wrote:
 Dear CCP4'ers
 With the occurrence of more and more high resolution protein structures does 
 anyone know at present how many such structures have been successfully 
 refined anisotropicall?�

When we tried to categorize refinement protocols in the PDB at the end
of 2009 we identified about 1200 protein structures that had been given
full anisotropic treatment.  Zucker et al, Acta Cryst. (2010). D66, 889–900

However, using automated search of the PDB it is hard to distinguish
full aniso refinement from structures refined with TLS but having missing 
or malformed TLS records.  

As to successfully, that's a separate question :-) 
May Robbie Joosten has more recent numbers from the PDB-Redo project, 
and a comment on success?

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] B-iso vs. B-aniso

2012-09-17 Thread Ethan Merritt

On Monday, September 17, 2012 11:31:53 am Yuri Pompeu wrote:
 Dear community,
 
 The protein model I am refining has 400 amino acids (3320 atoms).
 Some real quick calculations tell me that to properly refine it 
 anisotropically, I would need 119,520 observations. Given my unit-cell 
 dimension and space-group it is equivalent to about a 1.24 A complete data 
 set.
 However, I have had a couple of cases where anisotropic B-factor refinement 
 significantly improved R-work and R-free, while maintaining a reasonable gap 
 for lower resolution models (1.4-1.5 A, around 70,000 reflections). What is 
 the proper way of modelling the B-factors?

I laid out my thoughts on this topic at last year's CCP4 Study Weekend.
The print version of it may be found here:

   To B or not to B: a question of resolution? 
   Acta Cryst. D68, 468-477. 
   http://dx.doi.org/10.1107/S0907444911028320

One lesson is that lower R-work and R-free does not necessarily indicate that
anisotropic refinement is justified.  In other words, it is not so easy to
determine how much improvement is significant improvement.


 Any thoughts and/or opinions from the community are welcome.
 Cheers, 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Ethan Merritt

On Wednesday, September 12, 2012 07:32:54 am Jacob Keller wrote:
 Dear List,
 
 since this probably comes up a lot in manipulation of pdb/reflection files
 and so on, I was curious what people thought would be the best language for
 the following: I have some huge (100s MB) tables of tab-delimited data on
 which I would like to do some math (averaging, sigmas, simple arithmetic,
 etc) as well as some sorting and rejecting. It can be done in Excel, but
 this is exceedingly slow even in 64-bit, so I am looking to do it through
 some scripting. Just as an example, a sort which takes 10 min in Excel
 takes ~10 sec max with the unix command sort (seems crazy, no?). Any
 suggestions?

For the specific purpose you list -
input from tab-delimited data
output to simple statisitical summaries and (I assume) plots
- it sounds like gnuplot could do the job nicely.

Otherwise I'd recommend perl, and dis-recommend python.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Aimless and Pointless

2012-09-12 Thread Ethan Merritt

On Wednesday, 12 September 2012, Cosmo Z Buffalo wrote:
 Hi all,
 
 I am currently trying to perform a quickscale in iMosflm 7.0.9 after I 
 integrate in an R 32 space group.  Unfortunately, both Pointless and Aimless 
 are both giving me a best solution space group of P 43 3 2.  After analyzing 
 the statistics, this cannot be correct.  Other programs such as HKL2000 have 
 confirmed this to be true.  So my question: is it possible to force Aimless 
 and Pointless to generate statistics in a space group other than the one it 
 predicts?  And if so, how would I do this?

Pointless is usually very good at detecting additional symmetry elements.

So I'm idly curious - what does Pointless report as the correlation
coefficient and R-merge for the extra symmetry elements in P 43 3 2?
Could you show us the whole symmetry table?

Have you gotten far enough to see if their are NCS copies whose
positioning mimics the cubic symmetry?

Ethan

 
 -Cosmo

Re: [ccp4bb] Refmac: ADP is non-positive

2012-07-25 Thread Ethan Merritt

I can confirm the problem.

If I take a previous successful refmac refinement, remove the initial N atom 
from
the first residue in the input PDB file, and re-run the refinement I get:

 Problem with the ADP of the atom N   A 33 ADP is non-positive  
-94.380859
 Problem with the ADP of the atom N   A 33 ADP is non-positive  
-94.382088

starting already in CGMAT cycle 1.

Deleting instead the C-terminal O atom[s] does not produce an equivalent error.

CCP4 6.3: Refmac_5.7.0029 

Nevertheless, in my test case the refinement ran successfully despite the error
messages.

Ethan



On Tuesday, July 24, 2012 09:54:26 am wtempel wrote:
 There are atom records for C, O and CA (B-factors 42, 43, 40A**2,
 respectively), but not for N, as density tapers off going to the amino
 terminus (well, without the amino in this case). Residue 3 is the
 lowest-numbered residue in its chain. B-factors of N, CA of residue 4 are
 38, 33A**2, respectively. Could refmac just be taking exception to the
 missing N atom?
 
 -- Forwarded message --
 From: Ethan Merritt merr...@u.washington.edu
 Date: Tue, Jul 24, 2012 at 11:27 AM
 Subject: Re: [ccp4bb] Refmac: ADP is non-positive
 To: wtempel wtem...@gmail.com
 Cc: CCP4BB@jiscmail.ac.uk
 
 
 On Tuesday, 24 July 2012, wtempel wrote:
  CCP4ers,
 
  a log file from Refmac_5.7.0027 presents me with this line:
 
  fromLog
  Problem with the ADP of the atom N   A  3 ADP is non-positive
  -1.7740907E+35
  /fromLog
 
  I did not explicitely refine ADPs or TLS.
  Should I modify my model when I encounter such a message? If yes, does the
  message refer to a specific atom, such as atom N of residue 3 in chain A?
 I
  should note that that atom is omitted from my model due to lack of
 electron
  density/disorder.
 
 What do you mean by omitted from the model?
 Are there no ATOM records for that residue in the PDB file?
 What are the B factors for the other atoms in that region?
 
 Ethan
 
 
  Many thanks in advance,
  Wolfram Tempel
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Refmac: ADP is non-positive

2012-07-24 Thread Ethan Merritt

On Tuesday, 24 July 2012, wtempel wrote:
 CCP4ers,
 
 a log file from Refmac_5.7.0027 presents me with this line:
 
 fromLog
 Problem with the ADP of the atom N   A  3 ADP is non-positive
 -1.7740907E+35
 /fromLog
 
 I did not explicitely refine ADPs or TLS.
 Should I modify my model when I encounter such a message? If yes, does the
 message refer to a specific atom, such as atom N of residue 3 in chain A? I
 should note that that atom is omitted from my model due to lack of electron
 density/disorder.

What do you mean by omitted from the model?
Are there no ATOM records for that residue in the PDB file?
What are the B factors for the other atoms in that region?

Ethan

 
 Many thanks in advance,
 Wolfram Tempel

Re: [ccp4bb] How to identify unknow heavy atom??

2012-07-24 Thread Ethan Merritt

On Tuesday, July 24, 2012 10:22:18 am Nat Echols wrote:
 On Tue, Jul 24, 2012 at 10:14 AM, Haytham Wahba haytham_wa...@yahoo.com 
 wrote:
  1- if i have anomalous peak of unknown heavy atom, How can i identify this
  heavy atom in general. (different methods)
 
  2- in my case, i see anomalous peak in heavy atom binding site (without any
  soaking). preliminary i did mass spec. i got Zn++ and Cu, How can i know
  which one give the anomalous peak in my protein.
 
  3- there is way to know if i have Cu+ or Cu++.
 
 You may be able to identify the element based on the coordination
 geometry - I'm assuming (perhaps incorrectly) that it is actually
 different for Cu and Zn.  Marjorie Harding has written extensively on
 the geometry of ion binding:
 
 http://tanna.bch.ed.ac.uk/
 
 The only way to be certain crystallographically, if you have easy
 access to a synchrotron, is to collect data above and below the K edge
 of any candidate element, and compare the difference maps.  (For
 monovalent ions it is more complicated, since they don't have
 accessible K edges.)  On a home source, Cu should have a larger
 anomalous map peak, but I'm not sure if this will be enough to
 identify it conclusively.

As to the SR experiment - yes.

As to the home source - no.  
Neither Cu nor Zn has appreciable anomalous signal when excited with a 
Cu K-alpha home source.
  http://www.bmsc.washington.edu/scatter

An element's emission edge (Cu K-alpha in this case) is about 1 keV below
the corresponding absorption edge.  This makes sense, because after
absorbing a photon it can only emit at an equal or lower energy, not a
higher energy.  So you can't reach the Cu absorption edge, where the
anomalous signal is, by exciting with Cu K-alpha.

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Chiral volume outliers SO4

2012-07-15 Thread Ethan Merritt

On Sunday, 15 July 2012, Dale Tronrud wrote:
There are good reasons for maintaining order in this human-induced
 numbering scheme.  A common operation is to superimpose two molecules
 and calculate the rmsd of the positional differences.  This calculation
 is not useful when the Val CG1 and CG2 are swapped in one molecule relative
 to the other.  Suddenly you have, maybe a handful, of atoms that differ
 in position by about 3.5 A when most of us would consider this to be
 nonsense.  We want the rmsd between equivalent atoms regardless of the
 human-induced numbering scheme.  There are two ways this can come about.
 1) The overlay program could swap the labels on one to match the other or
 2) The labels can be defined to be consistent from the start.

  3) The closest-to-superimposed atoms could be paired regardless of 
 their labels
  4) Chemically equivalent atoms could be given the same name, which
 is then not unique but allows it to match any other atoms with
 the same name

I'd normally choose (3) in practice, because it's the only method that
works reliably without universal agreement going forward and universal
remediation looking backward.

cheers,

Ethan

Re: [ccp4bb] do you think it is interesting?

2012-06-18 Thread Ethan Merritt

On Monday, June 18, 2012 02:06:46 pm Alexander Scouras wrote:
  I'm further racking my brain to figure out a biological implication of this 
  behaviour, I thought something like plaque formation but I can't find 
  support in literature.
 
 
 There are a variety of domain swapped crystal structures out there, but at 
 least the two I'm most familiar with are regarded as being crystallization 
 artifacts. I think I recall seeing examples where domain swapping was 
 biologically relevant, but my impression is most are red herrings. 

You might be interested in the following paper, which describes
domain-swapped (domain exchange) dimerization as a control mechanism for 
kinases.

 Activation segment dimerization: a mechanism for kinase 
 autophosphorylation of non-consensus sites.
 Pike, A.C.W.,  Rellos, P.,  Niesen, F.H.,  Turnbull, A.,  Oliver, A.W.,
 Parker, S.A.,  Turk, B.E.,  Pearl, L.H.,  Knapp, S.,  
 Journal: (2008) Embo J. 27: 704

But these are specifically dimeric.  Unlike the case posted here,
there is not a second non-swapped interface that would allow 
formation of an infinite chain.

Ethan



 
 In the poster child of plaque formation, prion protein formed cys-cross 
 linked domain swapped dimers in some crystals. 
 
 http://www.nature.com/nsmb/journal/v8/n9/abs/nsb0901-770.html
 
 However, using PAGE  DLS it was later shown that prion has no preference for 
 dimers when you break down Infectious fibrils. Cross linked dimers definitely 
 out. Any subunits ruled out, in fact. 
 
 http://www.nature.com/nature/journal/v437/n7056/abs/nature03989.html
 
 
 RNaseA is another example, and isn't even a disease associated molecule. 
 Similarly to how we've found that many/most proteins may be converted to 
 amyloid forms by harsh enough conditions, I think some will domain swap, and 
 some authors have pursued domain swapping heavily with RNaseA a as a model 
 for amyloid formation. RNaseA will swap in major and minor conformations 
 even, though not in the same crystal. Still, that's the first thing you need 
 for an infinite series, is two compatible/simultaneous swapping points. 
 
 
 Now, I do think domain swapping, particularly an infinite chain, can be 
 interesting from a bioengineering or biophysical level, if that is what you 
 are interested in. I just want to say that there is a high bar to showing 
 biochemical relevance in the sense of holding any physiological implications. 
 
 
 Alexander D. Scouras
 Postdoctoral Fellow
 Alber Lab, QB3
 University of California, Berkeley

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] how to interpret DALI search results

2012-06-12 Thread Ethan Merritt

On Tuesday, June 12, 2012 02:29:13 pm Jerry McCully wrote:
 
 Dear ALL;
 
 After we solved our structure by anomalous scattering, we did a DALI 
 search. Here are the results but it is not easy to draw meaningful 
 conclusions whether our structure represents a novel fold or is homologous to 
 others.

I don't think the question is it homologous can ever be answered by 
a DALI score, whether it is high or low.  For that you need to think
about sequence families, inferred evolutionary history, shared function,
etc.

If your structure has a biological function related to that of your DALI
hit, I'd be inclined to consider seriously whether they could be distant
homologs.  If not, I doubt you will convince anyone based only on a 
vaguely similar fold.

A sequence identity of 15% is really low, but that is presumeably only
a pairwise comparison.  You should try PSIBLAST or similar to see if
both your protein and the DALI hit are recognizably members of the
same sequence family or superfamily.

Ethan



 
Basically the Z-score is between 2 and 6.4 since our structure only 
 contains 130 residues. Sequence identity is between 5 to 15%.
 
The RMSD of structural alignment is between 2.5 to 6 angstrom.  
 
Any suggests to interpret the DALI results? Many thanks,
 
 Jerry McCully
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] lithium incorporation in refmac

2012-06-08 Thread Ethan Merritt

On Friday, June 08, 2012 11:35:25 am Faisal Tarique wrote:
 Dear all
 
 i have downloaded lithium coordinates for the density i guess is for
 lithium but i think while refinement in refmac is not taking lithium into
 the consideration. 

If you see density, it might not be lithium :-)

 i want to know how to obtain cif file for lithium and
 incorporate it into the refmac for refinement..

LI is in the standard monomer dictionary. 
You don't have to do anything special.
But given that it has no scattering power to speak of, you may
have to add explicit restraints to hold it in place during
refinement.

LI appears in 41 PDB entries.  You might want to inspect the
density in some of them to get a feel for how it looks.

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Fun Question - Is multiple isomorphous replacement an obsolete technique?

2012-06-05 Thread Ethan Merritt

On Tuesday, 05 June 2012, Stefan Gajewski wrote:
 Hey!
 
 I was just wondering, do you know of any recent (~10y) publication that
 presented a structure solution solely based on MIR? Without the use of any
 anomalous signal of some sort?

A text search for MIR returns 1377 PDB structures overall.
Of these 706 were deposited in the last 10 years,
and 34 were deposited in the last 12 months.

The most recent was released today (6 Jun 2012)
HEADERHYDROLASE   17-APR-12   4EPC  
TITLE CRYSTAL STRUCTURE OF AUTOLYSIN REPEAT DOMAINS FROM STAPHYLOCOCCUS
REMARK 200 DIFFRACTION PROTOCOL: SINGLE WAVELENGTH  
REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MIR  
REMARK 200 SOFTWARE USED: SOLVE 
REMARK 200 STARTING MODEL: NULL 

Caveats:
I have no idea how many of those structures say MIR because it's part
of the protein name or some such, I have no idea how accurate the
REMARK 200 fields are in any case, and I don't really trust the 
www.pdb.org search interface in general.

 When was the last time you saw a structure that was solved without the use
 of anomalous signal or homology model? Is there a way to look up the answer
 (e.g. filter settings in the RCSB) I am not aware of?
 
 Thanks,
 S.
 
 (Disclaimer: I am aware that isomorpous data is a valuable source of
 information)

Re: [ccp4bb] Fwd: [ccp4bb] Death of Rmerge

2012-05-31 Thread Ethan Merritt

On Thursday, May 31, 2012 02:21:45 pm Dale Tronrud wrote:
The resolution limit of the data set has been such an important
 indicator of the quality of the resulting model (rightly or wrongly)
 that it often is included in the title of the paper itself.  Despite
 the fact that we now want to include more, weak, data than before
 we need to continue to have a quality indicator that readers can
 use to assess the models they are reading about.  While cumbersome,
 one solution is to state what the resolution limit would have been
 had the old criteria been used, as was done in the paper you quote.
 This simply gives the reader a measure they can compare to their
 previous experiences.

[\me dons flame suit]

To the extent that reporting the resolution is simply a stand-in
for reporting the quality of the model, we would do better to cut
to the chase.  For instance, if you map the Molprobity green/yellow/red
model quality scoring onto good/mediocre/poor then you can title
your paper

   Crystal Structure of Fabulous Protein Foo at Mediocre Quality

[\me removes flame suit from back, and tongue from cheek]


More seriously, I don't think it's entirely true that the resolution
is reported as an indicator of quality in the sense that the model
is well-refined.  There are things you can expect to learn from a
2Å structure that you are unlikely to learn from a 5Å structure, even
if equal care has been given to both experiments, so it makes sense
for the title to give the potential reader an idea which of the two
cases is presented.  But for this purpose it isn't going to matter
whether 2Å is really 1.8Å or 2.2Å.  

Now would be a good time to break with tradition and institute
 a new measure of quality of diffraction data sets.  I believe several
 have been proposed over the years, but have simply not caught on.
 SFCHECK produces an optical resolution.  Could this be used in
 the title of papers?  I don't believe it is sensitive to the cutoff
 resolution and it produces values that are consistent with what the
 readers are used to.  With this solution people could include whatever
 noisy data they want and not be guilty of overstating the quality of
 their model.

We should also encourage people not to confuse the quality of 
the data with the quality of the model.

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Deposition of riding H

2012-05-12 Thread Ethan Merritt

On Saturday, 12 May 2012, Yuri Pompeu wrote:
 If you used riding hydrogens throughout refinement and arrived at a final 
 model that you believe best describes your x-ray data to a certain level of 
 accuracy (Rvalues, geometry, map CC, etc...) would you not be invalidating 
 the whole refinement process by going in and removing the hydrogen atoms 
 right before deposition?

My view:

You are not removing hydrogen atoms at all. You are stating that the model
being deposited includes riding hydrodens.  The consumer of your model can
regenerate the individual hydrogen coordinates from that information if
needed, just as refmac does when you start a new refinement cycle with the
riding hydrogen model selected.  You don't need to output the individual 
hydrogen coordinates between cycles, or at deposition time, because they
are adequately described by the riding hydrogen model.

You might as well ask why do we remove all copies of the molecules in
the crystal except for those in a single asymmetric unit?
They are not really removed; they are implicit in the statement of
the crystallographic symmetry.

Ethan

Re: [ccp4bb] P21221 to P21212 conversion

2012-05-07 Thread Ethan Merritt

On Monday, May 07, 2012 01:09:25 pm Shya Biswas wrote:
 Hi all,
 I was wondering if anyone knows how to convert the P21221 to P21212
 spacegroup in HKL2000. I scaled the data set in P21212 in HKL 2000 but I
 got a correct MR solution in P21221 spacegroup. 

Shya:

Scaling is done in a point group, not a space group.

The point group P222 contains both space groups P2(1)22(1) and P2(1)2(1)2,
so your original scaling is correct in either case.

It is not clear from your query which of two things happened: 

1) The MR solution kept the same a, b, and c axis assignments but made a 
different call on whether each axis did or did not correspond to a 2(1) screw.
In this case you don't need to do anything to your files.  Just make sure
that you keep the new space group as you go forward into refinement.

2) The MR solution kept the orginal screw-axis identifications but 
permuted the axes to the standard setting (non-screw axis is labelled c).
In this case you will need to construct a file containing the permuted
indices.  For example, the reflection originally labeled  (h=1 k=2 l=3) is now
(h=3 k=1 l=2).  There are several programs that can help you do this,
including the HKL2000 GUI.   But you do not need to go back into HKL
if you don't want to.  You could, for example, use the ccp4i GUI to
select
- Reflection Data Utilities
   - Reindex Reflections
  Define Transformation Matrix by entering reflection transformation
  h=l k=h l=k


Ethan


 I have a script file that
 runs with scalepack but was wondering if there is an easier way to do it
 with HKL2000 gui mode.
 thanks,
 Shya
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] P21221 to P21212 conversion

2012-05-07 Thread Ethan Merritt

On Monday, May 07, 2012 01:42:58 pm Shya Biswas wrote:
 Hi,
 My case is old b is changed to c (scenario 2 as you explained) or hkl is
 changed to hlk. Thanks for the help

hkl - hkl  gives an inverted coordinate system.  You don't want that.

Ethan


 
 Shya
 
 On Mon, May 7, 2012 at 4:33 PM, Ethan Merritt merr...@u.washington.eduwrote:
 
  On Monday, May 07, 2012 01:09:25 pm Shya Biswas wrote:
   Hi all,
   I was wondering if anyone knows how to convert the P21221 to P21212
   spacegroup in HKL2000. I scaled the data set in P21212 in HKL 2000 but I
   got a correct MR solution in P21221 spacegroup.
 
  Shya:
 
  Scaling is done in a point group, not a space group.
 
  The point group P222 contains both space groups P2(1)22(1) and P2(1)2(1)2,
  so your original scaling is correct in either case.
 
  It is not clear from your query which of two things happened:
 
  1) The MR solution kept the same a, b, and c axis assignments but made a
  different call on whether each axis did or did not correspond to a 2(1)
  screw.
  In this case you don't need to do anything to your files.  Just make sure
  that you keep the new space group as you go forward into refinement.
 
  2) The MR solution kept the orginal screw-axis identifications but
  permuted the axes to the standard setting (non-screw axis is labelled c).
  In this case you will need to construct a file containing the permuted
  indices.  For example, the reflection originally labeled  (h=1 k=2 l=3) is
  now
  (h=3 k=1 l=2).  There are several programs that can help you do this,
  including the HKL2000 GUI.   But you do not need to go back into HKL
  if you don't want to.  You could, for example, use the ccp4i GUI to
  select
  - Reflection Data Utilities
- Reindex Reflections
   Define Transformation Matrix by entering reflection transformation
   h=l k=h l=k
 
 
 Ethan
 
 
   I have a script file that
   runs with scalepack but was wondering if there is an easier way to do it
   with HKL2000 gui mode.
   thanks,
   Shya
  
 
  --
  Ethan A Merritt
  Biomolecular Structure Center,  K-428 Health Sciences Bldg
  University of Washington, Seattle 98195-7742
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] P21221 to P21212 conversion

2012-05-07 Thread Ethan Merritt

On Monday, May 07, 2012 02:00:43 pm Phil Jeffrey wrote:
 On Mon, May 7, 2012 at 3:33 PM, Ethan Merritt
   Scaling is done in a point group, not a space group.
 
 My quibble with this statement is that the output reflection data from 
 Scalepack differs depending on what space group you tell it, since 
 systematic absences along h00, 0k0 and 00l in P2x2x2x are not written 
 out.  The number of reflections affected is quite small, of course.

The statement is correct, but the scalepack behavior is IMHO a bad thing.
Therefore I always tell it to scale in the pointgroup (P222 in this case)
and I correct the space group later.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Off-topic: PDB deposition of multiple structure factor files

2012-04-27 Thread Ethan Merritt

On Friday, April 27, 2012 11:23:13 am Florian Schmitzberger wrote:
 Dear All,
 
 With my most recent PDBe deposition, in addition to the native data, I  
 had intended to deposit the anomalous data, used for structure  
 determination, and make it available for download. This turned out to  
 be less straightforward than I had anticipated, because the current  
 PDB convention is to only allow a single structure factor file for  
 experimental data (usually the native dataset), available for download  
 from the PDB. In my case, the anomalous data were concatenated with  
 the native data into a single cif file (this worked and made sense,  
 because both for both datasets the unit cell dimensions are virtually  
 identical).
 
 I imagine it would be beneficial to be able to make available more  
 than a single structure factor file, including the ones derived from  
 experimental phasing, in the PDB, along with the final coordinates,  
 without concatenating the data into a single file (which may lead to  
 confusion to users when downloaded). Is this anything the PDB is  
 already working to implement in the near future (perhaps via the  
 coming PDBx format)?


The PDB has always been perfectly happy to accept whatever SF files
I send them.  On rare occasions they have gotten mangled in the
process, but that's a separate issue :-)

But re-reading your Email, I see that your concern is that there
is only a single link on the structure's web page for download.
I.e., an issue of retrieval rather than a problem with deposition.

Still, I don't see anything inherently confusing about a file that
contains multiple data sets.  That will be true for any MAD experiment.

Have you asked the PDB whether there is a mechanism for making 
supplemental files visible on the auto-generated web page?

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Off-topic: Supplying PDB file to reviewers

2012-04-25 Thread Ethan Merritt

On Wednesday, April 25, 2012 09:40:01 am James Holton wrote:

 If you want to make a big splash, then don't complain about 
 being asked to leap from a great height.


This gets my vote as the best science-related quote of the year.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] mtz2cif capable of handling map coefficients

2012-04-05 Thread Ethan Merritt

On Thursday, April 05, 2012 08:25:05 am Francis E Reyes wrote:
 It seems that deposition of map coefficients is a good idea. 
 Does someone have an mtz2cif that can handle this? 

Maybe I missed something.
What is accomplished by depositing map coefficients that isn't
done better by depositing Fo and Fc?

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] mtz2cif capable of handling map coefficients

2012-04-05 Thread Ethan Merritt

On Thursday, April 05, 2012 09:30:25 am Phil Jeffrey wrote:
 Fc doesn't contain the weighting scheme used in the creation of the map 
 coefficients, so Fc would require some sort of program to be run to 
 recreate those for both 2Fo-Fc and Fo-Fc maps.  

The viewers I am familiar with do this for themselves on the fly.
No need to involve additional programs. In fact, generating and storing
map coefficients is not part of my work flow, since none of the programs 
I normally use need them to be pre-calculated.

 By which time you might 
 as well run a single cycle of the refinement program in question to 
 generate new map coefficients - so I don't see the benefit of Fc.

You must use a different tool set than I do.
 
 The map coefficients, on the other hand, are a checkpoint of the maps 
 being looked at by the author at the time of deposition and don't 
 require programs beyond a typical visualization program (i.e. Coot) to view.

But is that a good thing or a bad thing?

I would rather make my own call about weighting and choice of maps,
so I would rather have the Fo and Fc.  Anyhow, Coot reads in and displays
maps just fine from an mtz or cif file containing Fo and Fc but no
map coefficients.   It is true that usually you want to have a value for 
the FOM or other weight avalailable also.

cheers,

Ethan


 Phil Jeffrey
 Princeton
 
 On 4/5/12 12:00 PM, Ethan Merritt wrote:
  On Thursday, April 05, 2012 08:25:05 am Francis E Reyes wrote:
  It seems that deposition of map coefficients is a good idea.
  Does someone have an mtz2cif that can handle this?
 
  Maybe I missed something.
  What is accomplished by depositing map coefficients that isn't
  done better by depositing Fo and Fc?
 
  Ethan
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] mtz2cif capable of handling map coefficients

2012-04-05 Thread Ethan Merritt

On Thursday, April 05, 2012 10:48:16 am Oliver Smart wrote:
 
 On Thu, 5 Apr 2012, Ethan Merritt wrote:
 
  On Thursday, April 05, 2012 09:30:25 am Phil Jeffrey wrote:
  Fc doesn't contain the weighting scheme used in the creation of the map
  coefficients, so Fc would require some sort of program to be run to
  recreate those for both 2Fo-Fc and Fo-Fc maps.
 
  The viewers I am familiar with do this for themselves on the fly.
  No need to involve additional programs. In fact, generating and storing
  map coefficients is not part of my work flow, since none of the programs
  I normally use need them to be pre-calculated.
 
 
 
 Ethan,
 
 If you load a mtz file from refmac or BUSTER then this file contains Map 
 Coefficients. Different programs and protocols produce different maps.


I am bowing out of this discussion with apologies for any confusion that
I caused.  

I have realized that there may be a generational difference in
understanding the term map coefficient (or else my poor brain is just
not functioning as well as it ought to).  I thought that the proposal was
to require depositing the equivalent of a ccp4 *.map file, i.e. the 
real-space side of the Fourier transform.  I see now that people are
using map coefficient to mean weighted F, which was not what I originally
understood.

please carry on!

Ethan


 So 
 I second Phil's comment that including map coefficients in deposition is a 
 really good thing. It will enable people to see exactly the maps as seen 
 by the depositor (and to do so in a few years time). Hence we have 
 included map coefficients in 3 recent depositions 3syu, 3urp, 3v56 (using 
 a prototype mtz2cif tool that is not quite ready for release yet).
 
 We have also worked out how to patch ccp4 cif2mtz so that it can do the 
 reverse process see
 
 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ccp4bb;325e1870.1112
 
 
 Regards,
 
 Oliver
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] very informative - Trends in Data Fabrication

2012-04-01 Thread Ethan Merritt

On Sunday, 01 April 2012, Kendall Nettles wrote:
 
 What is the single Latin word for troll?
 
 Kendall
 
According to Google Translate, it is Troglodytarum.
But I'm dubious.  
I thought trolls lived under bridges rather than in caves.
Except for the ones who inhabit the internet, of course.

Ethan

Re: [ccp4bb] Substituting zero vs. Fc for unobserved reflections

2012-03-27 Thread Ethan Merritt

[Snipped from the full message, which is appended below]
 The program that kept showing me two forms bound was not
 substituting Fcalc for unobserved reflections.  So, I turned on the option
 to substitute Fcalc, and the minor form disappeared � the density looked
 like it did in the second program.  I figured the density that reveals the
 two forms must be correct being that it would be a big coincidence for
 artifactual density to appear that just so happens to fit perfectly our
 added (unmodified)ligand at 1.55 A.  So, I suppose, being that the occupancy
 of the major form is so much higher, by substituting unobserved reflections
 with Fcalc, the major form is being overemphasized, and the minor form
 becomes invisible.

A weighted difference map (mFo - DFc) that does not include the ligand
atoms in Fc at all would be a better guide.  It would not be biased by
the current ligand model [or at least much less biased] and it certainly
would not be sensitive to the modelled occupancies since these atoms
would not be contributing to Fc at all regardless of occupancy.

It is in general more convincing to show difference density from 
an Fo-Fc map with ligands omitted from Fc than it is to show density 
from some variant of 2Fo-Fc with ligands included in Fc.

How complete is your data set?  
Are you trying to deal with more than a few per cent of missing reflections?
If it is only the highest resolution shell that has poor completeness,
have you tried truncating the map calculation to a shell that is complete?

Ethan


On Tuesday, March 27, 2012 01:06:16 pm Gregg Crichlow wrote:
 Please excuse me for bringing up an old issue.  I have an interesting
 example of a difference seen when DFc was substituted for missing
 reflections versus when it wasn�t. Maybe others had this experience.  I had
 a structure in which the electron density showed two �overlapping� ligands
 bound in the same active site.  One was the ligand that was co-crystallized
 with the protein.  The other was the same ligand but with an unintentional
 modification (presumably due to radiation dose).  I was able to discern the
 two forms in the electron density (1.55 A) being that they did not
 completely overlap.  Based on occupancy refinement, the occupancies were
 0.12 and 0.88 (unmodified and modified forms, respectively).  Then one time
 I calculated the map using a second program, and the lower occupancy ligand
 disappeared!  When I calculated maps in the first program, there were again
 two forms visible.  I thought that the difference may be due to the
 difference between substituting unobserved reflections with Fc (or rather
 DFc because of sigma-A weighting) versus omitting them from the Fourier
 transform.  The program that kept showing me two forms bound was not
 substituting Fcalc for unobserved reflections.  So, I turned on the option
 to substitute Fcalc, and the minor form disappeared � the density looked
 like it did in the second program.  I figured the density that reveals the
 two forms must be correct being that it would be a big coincidence for
 artifactual density to appear that just so happens to fit perfectly our
 added (unmodified)ligand at 1.55 A.  So, I suppose, being that the occupancy
 of the major form is so much higher, by substituting unobserved reflections
 with Fcalc, the major form is being overemphasized, and the minor form
 becomes invisible.
   There may be many cases in which substituting Fcalc (or
 DFc) for missing reflections is beneficial. I don�t know the mathematical or
 theoretical arguments behind it.  I�m not arguing for one way being
 generally superior to the other, or for one program over another.  However,
 this is one empirical example of it being advantageous not to make this
 substitution.
   When calculating experimentally phased maps, we multiply
 our structure factors by a figure of merit to down-weight reflections with
 less certain phases. Could one consider leaving missing reflections as zero
 analogous to multiplying Fcalc by FOM = 0? (just asking � maybe this is
 faulty logic.) Of course, this would be for the sake of the amplitude
 instead of the phase in this case. If an intensity is not observed, we have
 the ultimate uncertainty regarding its value.
 Maybe some developers will want to use this structure and the corresponding
 data to test DFc vs. �0� vs. DFc multiplied by a specific FOM only used for
 the missing reflections, varying from 0 to 1.  Unfortunately, this structure
 is not yet published (we needed to wait for other experiments to be
 finished) so I cannot yet provide it or the structure factors. However, if
 anyone is interested, feel free to contact me, and when it is published I
 would be happy to let you know the PDB code, if you still want it.
 
 
 
 ***
 Gregg Crichlow
 Dept. of Pharmacology
 Yale University
 P.O. Box 208066
 New Haven, CT 06520-8066

Re: [ccp4bb] a question about protein sequences in the PDB

2012-03-26 Thread Ethan Merritt

On Monday, 26 March 2012, Francois Berenger wrote:
 Dear list,
 
 If I take all the fasta files for proteins in the PDB,
 are the sequences complete?
 
 I mean, do they have holes sometimes (missing amino acids)?

In theory the SEQRES records describe the sequence of the
entity that was crystallized, whether or not it is all visible
in the electron density or present in the deposited model.
So normally there should not be any missing internal
residues.  But if the expression construct was a not the full
gene sequence, e.g. an N-terminal truncation, then those
N- or C- terminal residues (or whole domains) will not be
listed.

So goes the theory. There are always corner cases.
I remember having a dispute with the PDB long ago about
whether a peptide chain that was known to have undergone
loop cleavage was properly described with a single
chain identifier or with two chain identifiers.  And if the
cleavage involved excission of one or more residues, would
they appear in the SEQRES records anyhow?


 Sorry for the maybe stupid question but I know that sometimes
 the PDB files have missing residues, I am hoping that
 it is not the case with the FASTA files.

I was assuming that the FASTA files you refer to are just
conversions of the SEQRES records.  If not, then all bets are
off.  If the FASTA files are retrieved by gene ID from Uniprot
or some other sequence data base, then they will be complete in
one sense but may not perfectly match what was in the deposited
crystal structure due to cloning artifacts, strain variation,
allelic non-uniformity, etc.

Ethan

 Regards,
 Francois.

Re: [ccp4bb] sudden drop in R/Rfree

2012-03-02 Thread Ethan Merritt

On Friday, 02 March 2012, Regina Kettering wrote:
 Rajesh;
 
 I am not sure that you have a high enough data:refinement parameters ratio to 
 refine TLS. 
 It just adds more parameters to refine that can lead to over-refinement of 
 your model, 
 especially at the 3.3 A. 

I'm afraid you've got this completely backwards.
TLS uses very few parameters, and is especially useful at low resolution.
At 3.3A I would recommend trying a TLS model _instead_ of refining
individual B factors.

NCS restraints also help a lot at low resolution.

So the drop is believable, but...

You should first worry about lots of waters were placed.
It's there that many extra parameters have been added, perhaps leading
to over-fitting.  I would not expect 3.3A data to justify placement of
more than a handful of waters at most.

If you're parameter counting, you might note that 5 water molecules add 
more parameters than 1 TLS model.  But the TLS model may improve the model
everywhere, whereas the waters will only suppress a few local difference
density peaks.

cheers,

Ethan


 
 HTH,
 
 Regina
 
 
 
 
  From: Rajesh kumar ccp4...@hotmail.com
 To: CCP4BB@JISCMAIL.AC.UK 
 Sent: Friday, March 2, 2012 10:54 AM
 Subject: [ccp4bb] sudden drop in R/Rfree
  
 
  
 
 Dear All, 
 
 I have a 3.3 A data for a protein whose SG is P6522. Model used was wild type 
 structure of same protein at 2.3 A.
  
 After molecular replacement, first three rounds of refinement the R/Rf was  
 26/32.8,  27.1/31.72 % and 7.35/30.88 % respectively.
 In the fourth round I refined with TLS and NCS abd added water and the R/Rf 
 dropped to 19.34/26.46. It has almost 7% difference. I also see lot of 
 unanswerable density in the map where lot of waters were placed. Model fits 
 to the map like a low resolution data with most of side chains don't have 
 best density.
 
 I was not expecting such a sudden drop in the R/Rfree and a difference is 
 7.2%. 
 I am wondering if I am in right direction. I am not sure if this usual for 
 3.3A data or in general any data if we consider the difference.
  I appreciate your valuable  suggestions.
 
 Thanks
 Raj

Re: [ccp4bb] Extra positive density seen after TLS refinement?

2012-02-18 Thread Ethan Merritt

On Saturday, 18 February 2012, Naveed A Nadvi wrote:
 Dear crystallographers,
 
  
 
 I am fairly new in crystallographic work and structure determination, but I 
 thought this would be the best place to post my questions. We had collected 
 structural data for a protein that diffracted to 3 A. We had used a 
 previously deposited structure (1.5 A) for molecular replacement. Our final 
 structure used NCS restraints refinement over 4 chains within the assymetric 
 unit. We were able to assign some water moleules using COOT and subsequently 
 removed 'bad waters' manually. We used automated settings when dealing with 
 these water molecules. In all cases these water molecules were found in the 
 same positions as the initial structure (1.5 A) that we had used as a search 
 model. This gave us confidence in the placement of our water molecules. 
 Finally we had run validation tools (MolProbity) and our structure was found 
 to be with Molprobity score within the 100th percentile.
 
  
 
 We then performed a TLS refinement (from TLSMD) to further improve R values. 
 We used the final MolProbity-validated structure using 8 TLS groups and using 
 PureTLS with constant B factor (50). We are observing large positive 
 densities from the subsequent REFMAC5 refinement that are otherwise not 
 observed in the absence of TLS refinement. 

Is it possible that the peaks are not higher in terms of absolute electron 
level,
but only in terms of RMSD?   That is, if the TLS treatment cleans up the map
everywhere, then whatever peaks are left will deviate more from the (now lower)
mean value even though their absolute size is the same.  
In other words, the 3 sigma contours in your first map may be more like 
6 sigma contours in your second (cleaner) map.

 My questions are:
 
 1) Is TLS suitable for our dataset (3 A)?

There is no universal answer to that.  You just have to test for yourself each 
time.
Certainly TLS can help a lot at 3A for some structures.  In general the more
anisotropy is present, the more it helps to include it in your model somehow -
and TLS is a cheap way to include it in your model.  But if your structure 
does
not have much anisotropy, then adding TLS to describe it won't have much effect.

 2) Is TLS refinement independent of NCS refinement or should I define my NCS 
 based on the 8 TLS groups?

They are not the same thing at all.

 3) Is it normal to see extra positive density after TLS refinement and what 
 does it mean?

See possible explanation above. 

Ethan


 4) We had PEG4000 and Tris in our crystallization buffer. Could these 'blobs' 
 represent these molecules or short water chains? I have attached images of 
 the largest blob.
 
  
 
 Any comments and suggestions would be highly appreciated.
 
  
 
 Kind regards,
 
  
 
 Naveed A Nadvi
 
  
 
 Faculty of Pharmacy,
 
 University of Sydney, Australia

Re: [ccp4bb] Crystal Structures as Snapshots

2012-02-10 Thread Ethan Merritt

On Friday, February 10, 2012 12:51:03 pm Jacob Keller wrote:
 Interesting to juxtapose these two responses:
 
 James Stroud:
 How could they not be snapshots of conformations adopted in solution?
 
 David Schuller:
  How could that possibly be the case when any structure is an average of all
  the unit cells of the crystal over the timespan of the diffraction
  experiment?

This pair of perspectives is the starting point for the introductory
rationale I usually present for TLSMD analysis.  

The crystal structure is a snapshot, but just like a photographic snapshot
it contains blurry parts where the camera has captured a superposition
of microconformations.  When you photograph an object in motion, those
microconformations correspond to a trajectory purely along time.
In a crystallographic experiment, the microconformations correspond
to samples from a trajectory in solution.  Separation in time has
been transformed into separation in space (from one unit cell to
another).  A TLSMD model tries to reproduce the observed blurring by
modeling it a samples from a trajectory described by TLS displacement.

The issue of averaging over the timespan of the diffraction experiment
is relevant primarily to individual atomic vibrations, not so much to
what we normally mean by conformations of overall protein structure.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] off-topic: special format for multiple sequence (protein) alignment

2012-02-02 Thread Ethan Merritt

On Thursday, 02 February 2012, you wrote:
 Dear members,
 
 Apologize for this off-topic question. I am looking for a protein sequence
 alignment tool which is capable to generate a particular output file
 similar to the attached format (please see the attached picture). I have
 been looking at some popular programs but none of them can show the
 conserved amino acids by colored blocks as shown in the attached file.
 
 Maybe some of you have seen some programs can do this? Thank you.

That looks similar to the output of TeXshade, with the shading
mode set to hydropathy.

Ethan

Re: [ccp4bb] writing scripts-off topic

2012-01-23 Thread Ethan Merritt

On Monday, 23 January 2012, Yuri Pompeu wrote:
 Hello Everyone,
 I want to play around with some coding/programming. Just simple calculations 
 from an 
 input PDB file, B factors averages, occupancies, molecular weight, so forth...
 What should I use python,C++, visual basic?

What you describe is primarily a task of processing the text in a PDB file.
I would recommend perl, with python as a more trendy alternative.

If this is to be a springboard for a larger project, then you might choose
instead to use a standard library like cctbx to do the fiddly stuff and 
call it from a higher level language (C or C++).

Ethan

Re: [ccp4bb] New Faster-than-fast Fourier transform

2012-01-20 Thread Ethan Merritt

On Friday, 20 January 2012, Jim Fairman wrote:
 New Fourier transform algorithm supposedly improves the speed of Fourier
 transforms get up to a tenfold increase in speed depending upon
 circumstances.  Hopefully this will get incorporated into our refinement
 programs.
 
 http://web.mit.edu/newsoffice/2012/faster-fourier-transforms-0118.html

This report is interesting, but it is not immediately obvious to me that
crystallographic transforms are in the class of problems for which
this algorithm is applicable.   

From reading the very non-technical article linked above, I conclude that
a better summary would be New approach to Fourier approximation provides 
a very cheap (fast) way of identifying and then discarding components that
contribute very little to the signal.  In other words, it seems to be a
way of increasing the compression ratio for lossy image/audio compression
without increasing the amount of time required for compression.

So if you're doing map fitting while listening to streamed mp3 music 
files, perhaps your map inversion will get a slightly larger slice of
the CPU time relative to LastFM.

On the other hand, it is possible that somewhere in here lies a clever
approach to faster solvent flattening.

Ethan

Re: [ccp4bb] MAD

2012-01-19 Thread Ethan Merritt

On Thursday, 19 January 2012, Ian Tickle wrote:
 So what does this have to do with the MAD acronym?  I think it stemmed
 from a visit by Wayne Hendrickson to Birkbeck in London some time
 around 1990: he was invited by Tom Blundell to give a lecture on his
 MAD experiments.  At that time Wayne called it multi-wavelength
 anomalous dispersion.  Tom pointed out that this was really a misnomer
 for the reasons I've elucidated above.  Wayne liked the MAD acronym
 and wanted to keep it so he needed a replacement term starting with D
 and diffraction was the obvious choice, and if you look at the
 literature from then on Wayne at least consistently called it
 multi-wavelength anomalous diffraction.

Ian:

The change-over from dispersion to diffraction in MAD protein 
crystallography happened a couple of years earlier, at least with regard 
to work being done at SSRL.  I think the last paper using the term 
dispersion was the 1988 Lamprey hemoglobin paper.  The next two papers, 
one a collaboration  with Wayne's group and the other a collaboration
with Hans Freeman's group, used the term diffraction.

WA Hendrickson, JL Smith, RP Phizackerley, EA Merritt. 
Crystallographic structure-analysis of lamprey hemoglobin from 
anomalous dispersion of synchrotron radiation.
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 4(2):77–88, 1988.

JM Guss, EA Merritt, RP Phizackerley, B Hedman, M Murata, 
KO Hodgson, HC Freeman. 
Phase determination by multiple-wavelength X-ray-diffraction - 
crystal-structure of a basic blue copper protein from cucumbers. 
SCIENCE, 241(4867):806–811, AUG 12 1988.

WA Hendrickson, A Pahler, JL Smith, Y Satow, EA Merritt, RP Phizackerley. 
Crystal structure of core streptavidin determined from multiwavelength 
anomalous diffraction of synchrotron radiation. 
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF
AMERICA, 86(7):2190–2194, APR 1989.

On the other hand, David and Lilo Templeton continued to use the term 
anomalous dispersion for at least another decade, describing their 
diffraction experiments exploring polarization effects and other
characteristics of near-edge X-ray scattering by elements all over the
periodic table.

Ethan

 
 Cheers
 
 -- Ian
 
 On 18 January 2012 18:23, Phil Jeffrey pjeff...@princeton.edu wrote:
  Can I be dogmatic about this ?
 
  Multiwavelength anomalous diffraction from Hendrickson (1991) Science Vol.
  254 no. 5028 pp. 51-58
 
  Multiwavelength anomalous diffraction (MAD) from the CCP4 proceedings
  http://www.ccp4.ac.uk/courses/proceedings/1997/j_smith/main.html
 
  Multi-wavelength anomalous-diffraction (MAD) from Terwilliger Acta Cryst.
  (1994). D50, 11-16
 
  etc.
 
 
  I don't see where the problem lies:
 
  a SAD experiment is a single wavelength experiment where you are using the
  anomalous/dispersive signals for phasing
 
  a MAD experiment is a multiple wavelength version of SAD.  Hopefully one
  picks an appropriate range of wavelengths for whatever complex case one has.
 
  One can have SAD and MAD datasets that exploit anomalous/dispersive signals
  from multiple difference sources.  This after all is one of the things that
  SHARP is particularly good at accommodating.
 
  If you're not using the anomalous/dispersive signals for phasing, you're
  collecting native data.  After all C,N,O,S etc all have a small anomalous
  signal at all wavelengths, and metalloproteins usually have even larger
  signals so the mere presence of a theoretical d difference does not make it
  a SAD dataset.  ALL datasets contain some anomalous/dispersive signals, most
  of the time way down in the noise.
 
  Phil Jeffrey
  Princeton
 
 
 
  On 1/18/12 12:48 PM, Francis E Reyes wrote:
 
 
  Using the terms 'MAD' and 'SAD' have always been confusing to me when
  considering more complex phasing cases.  What happens if you have intrinsic
  Zn's, collect a 3wvl experiment and then derivatize it with SeMet or a 
  heavy
  atom?  Or the MAD+native scenario (SHARP) ?
 
  Instead of using MAD/SAD nomenclature I favor explicitly stating whether
  dispersive/anomalous/isomorphous differences (and what heavy atoms for each
  ) were used in phasing.   Aren't analyzing the differences (independent of
  source) the important bit anyway?
 
 
  F
 
 
  -
  Francis E. Reyes M.Sc.
  215 UCB
  University of Colorado at Boulder

Re: [ccp4bb] Merging data collected at two different wavelength

2012-01-18 Thread Ethan Merritt

On Wednesday, 18 January 2012, Soisson, Stephen M wrote:
 But if we were to follow that convention we would have been stuck with 
 Multi-wavelength Resonant Diffraction Experimental Results, or, quite simply, 
 MuRDER.

You could switch that to Multiple Energy Resonant Diffraction Experiment
but I don't think that would help any.

As to anomalous - the term comes from the behaviour of the derivative
 delta_(optical index) / delta_(wavelength)
This term is positive nearly everywhere, but is anomalously negative
at the absorption edge.

Ethan



 
 
 
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Jacob 
 Keller
 Sent: Wednesday, January 18, 2012 3:13 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] Merging data collected at two different wavelength
 
 This begs the question* whether you want the lemmings to understand
 you. One theory of language, gotten more or less from Strunk and
 White's Elements of Style, is that the most important feature of
 language is its transparency to the underlying thoughts. Bad language
 breaks the transparency, reminds you that you are reading and not
 simply thinking the thoughts of the author, who should also usually be
 invisible. Bad writing calls attention to itself and to the author,
 whereas good writing guides the thoughts of the reader unnoticeably.
 For Strunk and White, it seems that all rules of writing follow this
 principle, and it seems to be the right way to think about language.
 So, conventions, even when somewhat inaccurate, are important in that
 they are often more transparent, and the reader does not get stuck on
 them.
 
 Anyway, a case in point of lemmings is that once Wayne Hendrickson
 himself suggested that the term anomalous be decommissioned in favor
 of resonant. I don't hear any non-lemmings jumping on that
 bandwagon...
 
 JPK
 
 *Is this the right use of beg the question?
 
 
 
 
 
 On Wed, Jan 18, 2012 at 1:57 PM, Phoebe Rice pr...@uchicago.edu wrote:
 
  Can I be dogmatic about this ?
 
 I wish you could, but I don't think so, because even though those
 sources call it that, others don't. I agree with your thinking, but
 usage is usage.
 
  And 10,000 lemmings can't be wrong?

Re: [ccp4bb] RMSD of side chains

2012-01-13 Thread Ethan Merritt

On Friday, 13 January 2012, Appu kumar wrote:
 Dear ccp4 users,
Would you please guide me how to calculate
 the RMSD of side chains alone without considering C-alpha backbone.
 Is/are there any program/programs availble which do this job. I want
 to know the RMSD of side chains for  protein comparison.

What is the question that you are trying to answer?
If you are going to disregard the mainchain position, then
I would guess that you'd be better off comparing rotamer
classes than comparing coordinates.

Ethan 


 
 Thank you in advance.
 Appu

Re: [ccp4bb] RMSD of side chains

2012-01-13 Thread Ethan Merritt

On Friday, January 13, 2012 09:07:07 am Appu kumar wrote:
 Firstly thanks to Robert Nicholls for making me aware of the software
 necessary for side chain RMSD calculation. I have installed and now going
 through manual to use it for exploiting the structural differences. Thanks
 a lot.
 
 Secondly, for Ethan Merritt, I am seeking the information for comparing the
 side chains RMSD for better comparison of structure. Please correct me if i
 am wrong, i want to elaborate more on what i am thinking. If we have refine
 the structure well so that issue of rotamers are  fixed

Sorry, I don't know what you mean when you say the issue of rotamers are 
fixed.

 , then it is
 possible to take the advantage  of side chain orientation for correctly
 understanding the trivial differences between homologous proteins  and such
 differences harbouring good piece information for understanding protein
 structure-function relationship. Any kind of suggestion would be highly
 appreciated.

Let me put it this way.  Suppose you were reading a paper about someone
else's structures.  Which of these two statements would be more useful:
  1) The RMSD for sidechain atoms between apo and holo was 0.678 Å.
or
  2) Only two residues exhibited a significant change of conformation:
 the Asn XXX carboxamide flipped 180 degrees allowing ND to act as 
 H-bond donor to ligand atom FOO;  the Lys YYY sidechain occluded
 the ligand binding site in the apo structure but extends into the
 solvent when the ligand is bound.

Your comparison apparently involves a pair of homologs rather than a
pair of holo/apo structures, but I suggest to you that RMSD is even
more useless in this case.  For residues where the two sequences are
not identical, how do you even calculate an RMSD for sidechain atoms?

Ethan

 
 Thank you
 Appu
 
 On 13 January 2012 21:53, Ethan Merritt merr...@u.washington.edu wrote:
 
  On Friday, 13 January 2012, Appu kumar wrote:
   Dear ccp4 users,
  Would you please guide me how to calculate
   the RMSD of side chains alone without considering C-alpha backbone.
   Is/are there any program/programs availble which do this job. I want
   to know the RMSD of side chains for  protein comparison.
 
  What is the question that you are trying to answer?
  If you are going to disregard the mainchain position, then
  I would guess that you'd be better off comparing rotamer
  classes than comparing coordinates.
 
 Ethan
 
 
  
   Thank you in advance.
   Appu
  
 
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Always Modelling Anomalous Signal

2012-01-10 Thread Ethan Merritt

On Tuesday, January 10, 2012 02:46:21 pm Ian Tickle wrote:
 Jacob,
 
 Actually the R factors including the Bijvoet pairs would be higher,
 because the uncertainties in F(+) and F(-) are higher than that of
 F(mean) by a factor of about sqrt(2).  R factors will always be higher
 for unmerged data because averaging always reduces the uncertainty.
 This means that we are in effect 'cheating' by throwing away the
 relatively imprecise anomalous differences and getting falsely lower R
 factors as a result!  But as you imply the model would improve if we
 included the anomalous data in the refinement (assuming of course that
 it is meaningful data). 

This has been an option in refmac refinement for some while now.
If you are using the GUI, select the option for using SAD data directly
rather than the default option using no prior phase information.
Of course you must also make sure that appropriate scattering
factors are picked up by the program.

In my limited experience this has not led to improved refinement if
the anomalous differences are due only to sulfur, but has on occasion
led to noticeable improvement for true SeMet/SAD data.

Ethan

 This just demonstrates that low R factors do
 not necessarily equate to a better model - especially if you
 deliberately throw away the less precise data!  The model would
 improve (marginally) because the anomalous differences would obviously
 provide additional information about the anomalous scatterers and
 therefore increase their precision, but wouldn't affect the precision
 of the lighter atoms.  But is imprecision in the parameters of the
 heavy (or heavier) atoms usually an issue? - since these have bigger
 real scattering factors they will be more precisely determined than
 the lighter atoms anyway.  So I don't think you would gain very much,
 except maybe more truthful R factors!
 
 Cheers
 
 -- Ian
 
 On 10 January 2012 20:00, Jacob Keller j-kell...@fsm.northwestern.edu wrote:
  Dear Crystallographers,
 
  it seems to me that on a certain level we are always throwing away
  (sort of) about half of our data when we merge Bijvoet pairs--why
  shouldn't we keep them separate, since we know that they should be a
  little bit different, especially in light of the higher multiplicities
  which are more common now? Shouldn't modelling them give better
  R-values, and wouldn't it just be more true? I guess a sort of proof
  for this is that sulfurs are almost always detectable on anomalous
  difference maps, implying that we are actually measuring those
  differences accurately enough to see them (I don't think these things
  can arise from model bias, as anomalous differences are not modeled.)
  At least maybe at the final steps of refinement...?
 
  JPK
 
  --
  ***
  Jacob Pearson Keller
  Northwestern University
  Medical Scientist Training Program
  email: j-kell...@northwestern.edu
  ***
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Sub-angstrom resolution

2012-01-09 Thread Ethan Merritt

On Monday, January 09, 2012 11:37:23 am Ed Pozharski wrote:
 On Mon, 2012-01-09 at 18:15 +, Theresa H. Hsu wrote:
  Dear crystallographers
  
  A theoretical question - can sub-angstrom resolution structures only be 
  obtained for a limited set of proteins? Is it impossible to achieve for 
  membrane proteins and large complexes?
  
  Theresa
 
 On the matter of large proteins.
 
 Let's say your molecule is so big, the unit cell parameters are
 300x300x300 A.  To obtain 1A data, you need reflections with miller
 indices of ~300.  For these to be measurable, you need, I presume, ~300
 unit cells in each direction (otherwise you don't even have a formed
 Bragg plane).  300A x 300 ~ 10^5 A, or 10 micron.  So it seems to me
 that with large molecules you would essentially hit the crystal size
 limit.  In reality, to get any decent data one would need maybe 3000
 unit cells, or 100 micron crystal.  While such crystals could
 theoretically grow (maybe in microgravity), it is highly unlikely that
 the whole crystal will be essentially a single mosaic block.  Simply
 because large proteins are always multi-domain, and thus too flexible.

The ground-breaking work by Chapman et al. using the Stanford FEL to
record diffraction from nanocrystals of Photosystem II would seem to
constitute an encouraging counter-example
  Nature [2011] doi:10.1038/Nature09750

 So I'd say while everything is theoretically possible, for very large
 proteins the probability of getting submicron resolution is exceedingly
 small.

It remains to be seen what resolution might ultimately be achieved by
nanocrystal experiments.  As I understand it, the resolution of the 
work to date has been limited by the apparatus rather than by the crystals.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Structure Determination combining X-ray Data and NMR

2012-01-06 Thread Ethan Merritt

On Friday, January 06, 2012 09:30:22 am Nat Echols wrote:
 2012/1/6 Pete Meyer pame...@mcw.edu:
  However, at 3.2 Angstroms I'd recommend against using atomic B-factors -
  the rule of thumb for this is 2.8 Angstroms for atomic B-factors (or
  at least it was back in the day). �It might help to use an overall
  B-factor combined with one (or a few) TLS groups.
 
 This may be true for older software which restraints B-factors only to
 bonded atoms, but it is not the case in Phenix*, which takes into
 account all nearby atoms, not just bonded ones.  The result is that
 individual B-factor refinement is very stable at low resolution - we
 don't know what the limit is, but it routinely works very well at 4A.

Unfortunately, stable and statistically correct are two very different
criteria.  It is quite possible to have a stable refinement that produces
nonsensical, or at least unjustifiable, B factors.   Actually this caveat
applies to things other than B factors as well, but I'll stay on topic.

At last year's CCP4 Study Weekend I presented a statistical approach to 
deciding what treatment of B could be justified at various resolutions.
To B or not to B?  The presentations from that meeting should appear in a
special issue of Acta D soon.

Based on the set of representative cases I have examined, I am willing
to bet that with the limited obs/parameter ratio in the case at hand,
a model with individual Bs would turn out to be statistically unjustified
even if the refinement is stable.  A TLS model is more likely to be
appropriate.

cheers,

Ethan




 Of course the performance is still dependent on solvent content, NCS,
 etc., but it is very rare that grouped B-factor refinement actually
 works better.
 
 -Nat
 
 * I think Refmac may do something similar, but I haven't tried this
 recently.  I would be very surprised if it did not work well at 3.2A,
 however.
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Structure Determination combining X-ray Data and NMR

2012-01-06 Thread Ethan Merritt

On Friday, January 06, 2012 11:15:11 am Ed Pozharski wrote:
 On Fri, 2012-01-06 at 10:48 -0800, Ethan Merritt wrote:
 
  A TLS model is more likely to be appropriate.
 
 A quick clarification request if I may:
 
 We all seen how well the multi-group TLS models seem to match the
 B-factor variation along the chain.  Is this in your opinion how such
 model may be really effective, by incorporating most of the B-factor
 variation into ~100 TLS parameters?

I have run statistical analysis of alternative models on various
structures in the resolution range 2.8 - 4.0 A.   For some of these,
I found that a full 1-Biso-per-atom model was indeed statistically
justified.   For most, however, a TLS model was better.  For some,
a hybrid Biso + TLS model was better than either alone.  So this really
should be decided on a case by case basis rather than trying to come
up with a single rule of thumb.

Now as to how many TLS groups a model should be partitioned into, that 
varies all over the place and is clearly a consequence of the individual
lattice packing.  For some structures with loose packing (as I interpret
the cause), a single-group TLS model with uniform constant per-atom B
is significantly better than a model with a separate B factor for each
atom but no TLS component.  Adding additional TLS groups does not actually
help that much. To me this means that the largest factor contributing to
the ADPs is the overall displacement of the whole molecule within the
lattice, which is strongly anisotropic.  The single-group TLS model
describes this anisotropy well enough, while any number of isotropic B
factors does not.

Those cases where the individual B factor option tests out best correspond,
as I interpret it, to relatively rigid lattice packing.  In these crystals
the overall anisotropy is very low, so TLS models are not the right 
formalism to use in describing the distribution of ADPs.  Perhaps 
normal-mode models would be better;  it is hard to draw conclusions from
the very small number of normal-mode refinements reported to date.


 And a question:
 
 Given that the B-factors for side chain atoms will be generally higher,
 do you know if creating two separate sets of TLS parameters for
 backbone / side chains improves things?

That is a question that I am currently working on. I don't think that 
two sets of TLS parameters will turn out to be a good way to handle it.
I am more attracted to the idea of applying a TLS description on top of
a fixed  a priori model for B variation along the sidechain.  This 
approach is inspired by the per-amino acid targets for varying B along 
the sidechain that were developed by Dale Tronrud for use in TNT.

cheers,

Ethan


 Thanks,
 
 Ed.
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] linux upgrade preferences for CCP4

2011-12-21 Thread Ethan Merritt

On Wednesday, 21 December 2011, Paul Kraft wrote:
 hello,
 I'm considering upgrading my linux software from CENTOS5 to perhaps Fedora or 
 UBUNTO. Does anyone have an opinion about the best linux version to upgrade 
 to for not only CCP4 but also for general robustness and for the best 
 standard apps..Thanks

Well, since you ask...

My preference is for Mandriva, followed by Suse.
IMHO neither Fedora nor RHEL (of which CENTOS is a clone) are as
suitable right out of the box for use either at home or in the
lab.  We (lab scientists) are just not in their target audience,
so their packaging and configuration defaults are not the best
for our use.  Oh, and I much prefer a KDE desktop, which is only
an afterthought at best in the distros you mention.

Ethan

Re: [ccp4bb] Reference for Resolution Cutoffs

2011-12-06 Thread Ethan Merritt

On Tuesday, December 06, 2011 09:13:03 am Jacob Keller wrote:
 Dear Crystallographers,
 
 I hate to broach this subject again due to its wildly controversial
 nature, but I was wondering whether there was any reference which
 systematically analyses resolution cutoffs as a function of I/sig,
 Rmerge, Rmeas, Rpim, etc. I strongly dislike Rmerge/Rcryst for
 determining cutoffs, for obvious reasons--and especially for datasets
 of higher multiplicity--but nevertheless it is a ubiquitously-reported
 statistic, and one therefore has to make an argument against using it.

What is your question, exactly?
I don't follow the logic that because a statistic is reported, one
must therefore argue against it.

 Hopefully this could be done by pointing to a definitive reference--or
 am I stuck with a convention versus the truth? Maybe the ACA or
 similar could make a public anti-Rmerge proclamation about it, to make
 it easier for us?

Acta published at one point a guideline as part of the instructions
to authors, but the state of the art passed it by very soon after.
I suspect that is the inevitable fate of any such broad-brush proclamation.

Ethan

 
 Also, more generally, it seems that the refinement programs are now
 better able to discount lousy high-res data, so why not leave the
 choice to those programs, and just give them all of the data to the
 edge of the detector, especially since our computational and data
 storage capacities are now completely sufficient for that? One could
 then use some other metric for the goodness of the structure, such as
 what bin crosses the Rfree = 40% mark or something.
 
 One could push this even further and, as has been mentioned on this
 list before, just give the refinement program all of the intensities
 of the voxels in the 3D dataset?
 
 Jacob
 
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Reference for Resolution Cutoffs

2011-12-06 Thread Ethan Merritt

On Tuesday, December 06, 2011 11:43:05 am Jacob Keller wrote:
 Hi Ethan, thanks for pushing me to clarify--see below.
 
  I hate to broach this subject again due to its wildly controversial
  nature, but I was wondering whether there was any reference which
  systematically analyses resolution cutoffs as a function of I/sig,
  Rmerge, Rmeas, Rpim, etc. I strongly dislike Rmerge/Rcryst for
  determining cutoffs, for obvious reasons--and especially for datasets
  of higher multiplicity--but nevertheless it is a ubiquitously-reported
  statistic, and one therefore has to make an argument against using it.
 
  What is your question, exactly?
 
 The question is: is there a reference in which Rmerge has been
 thoroughly, clearly, and authoritatively discredited as a data
 evaluation metric in the favor of Rmeas, Rpim, etc., and if so, what
 is that reference?

Why assume that any of those are a valid criterion for discarding data?

I would argue that a better approach is to ask whether the data measured
in the highest resolution shell is contributing positively to the map
quality. The R_{whatever} may be an imperfect predictor of that, but is
not by itself the property of interest.
 
In other words, there are two separate issues in play here:

1) Is there a best measure of data quality in the abstract
   (i.e. it can be calculated before you solve the structure or
   calculate a map)?

2) Is there a standard statistic to choose what data is used for
   refinement?

If you just want to argue which R_{whatever} best serves to address 
the first issue, carry on.

If you are worried about the second issue, IMHO none of these 
quantities are appropriate.  They address entirely the wrong question.
We all know that good data does not guarantee a good model, and noisy
data may nevertheless yield a valid model. So you need a better reason
to discard data than it's noisy.

Ethan


  I don't follow the logic that because a statistic is reported, one
  must therefore argue against it.
 
 Let me say it clearer: when there is a conventional, standardized
 method that one wants to abandon in favor of a better method, in
 practice one has to make an argument for the new one and against the
 old one. This is in contrast to continuing to use the conventional
 method, which, even if apodictically surpassed by the newer method, de
 facto needs no justification. So, in the current example, if you want
 to use Rmeas or Rpim and not even report Rsym/merge, it will ruffle
 feathers, even though the former is certainly superior.

 
 Sorry for the confusion,
 
 Jacob
 
 ***
 Jacob Pearson Keller
 Northwestern University
 Medical Scientist Training Program
 email: j-kell...@northwestern.edu
 ***
 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] raw data deposition

2011-10-28 Thread Ethan Merritt

On Friday, October 28, 2011 08:29:46 am Boaz Shaanan wrote:
  Besides, I thought that by now there are some standards on how data should 
 be processed 
  (this has been discussed on this BB once every few months, if I'm not 
 mistaken). 

If this is true, I must not have got the memo!

I hear differences of opinion among senior crystallographers, even just
considering discussions at our local research meetings, let alone in the
context of world-wide practice.

- Where to set a resolution cutoff?  
- Use or not use a criterion on Rmerge (or Rpim or maximum scale factor or 
  completeness in shell)?
- Use all images in a run, or limit them to some maximal amount of decay?
- Empirical absorption correction during scaling?
- XDS? HKL? mosflm?

  Isn't that part of the validation process that so many good people have 
 established? 
  Also, to the best of my knowledge (and experience) referees (at least of 
 some journals) 
  are instructed to look into those issues these days and comment about them, 
 aren't they?
 
   Cheers,
 Boaz

As to what reviewers have access to, at best one sees a Table 1 with
summary statistics.  But rarely if ever do we see the protocol or 
decisions that went into the processing that yielded those statistics.

And more to the point of the current issue, a reviewer without access
to the original diffraction images cannot possibly comment on 
- Were there unexplained spots that might have indicated a supercell
  or other strangeness with the lattice?
- Evidence of non-merohedral twinning in the diffraction pattern?
- Was the integration box size chosen appropriately?
- Did the diffraction data clearly extend beyond the resolution limit
  chosen by the authors?

I hasten to add that I am not advocating for a requirement that the
diffraction images be sent to reviewers of a manuscript!  
But these are all examples of points where current opinion differs, 
and standard practice in the future may differ even more.
If the images are saved, then the quality of the data extracted from
them may improve using those not-yet-developed programs and protocols.  

So there is, to me, clearly some value in saving them.
How to balance of that value against the cost? - that's another question.

Ethan

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] To archive or not to archive, that's the question!

2011-10-28 Thread Ethan Merritt

On Friday, October 28, 2011 02:02:46 pm Gerard DVD Kleywegt wrote:
  I'm a tad disappointed to be only in fourth place, Colin! 
  What has the Pope ever done for crystallography?

   http://covers.openlibrary.org/b/id/5923051-L.jpg

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] IUCr committees, depositing images

2011-10-27 Thread Ethan Merritt

On Wednesday, 26 October 2011, James Holton wrote:
 Of course, if we are willing to relax the requirement of validation and 
 curation, this could be a whole lot easier.  In fact, there is already 
 an image deposition infrastructure in place!  It is called TARDIS:
 
 http://tardis.edu.au/
 
 Perhaps the best way forward would be for the PDB to introduce a new 
 field for one or more TARDIS ids in a PDB deposition?  It would be 
 optional at the first, but no doubt required in the future.

As I understand it, TARDIS is just an indexing system.
You're still on your own to actually store the images.
The TARDIS setup will cough up the information that your images
are stored on a machine named pony.lbl.gov, or at least they were 
at the time you registered them for indexing, where they supposedly
can be retrieved using tag #XYZ.

But so far as I know you would still be at the mercy of pony.lbl.gov
going up in flames, or being renamed twinkle.lbl.gov, or being 
decommissioned when the next budget crunch hits.

For that matter, I don't know what the provision or expectation
is that anyone outside your institution could see or access the
machine holding the set of files that TARDIS told them were there.

If I've got this wrong, perhaps Ashley Buckle can chime in with
an update on TARDIS.

Ethan

1 2 3 >

1 - 100 of 298 matches

Mail list logo