Re: [ccp4bb] Contact Surface Area

2009-09-25 Thread Brad Bennett
Hi Dan-
There's an online server and program called DPX which will calculate the
depth of all the atoms in your PDB file. An atom's depth is defined as the
minimum distance it is to the nearest solvent accessible atom, with depth =
0 meaning the atom is not buried at all (most likely at the protein surface)
whereas depth > 3 A or higher means the atom is most likely buried. I
understand these are not surface area values like you are wanting but it
does at least give you some quantitative measure if an atom is buried or
not. I've run depth calculations for several protein structures and upon
visual inspection of the structures, the depth values seem to be an accurate
descriptor of an atom's burial and/or exposure. Maybe this will tell you
something about your loop in the 2 states.

Alternatively, the authors of this program may very well have developed a
sister program that calculates exactly what you're wanting.

HTH-
Brad

On Fri, Sep 25, 2009 at 3:19 PM, Daniel Bonsor  wrote:

> Sorry I should of made this clearer in my original post. Thanks anyway to
> people who have responded thus far.
>
> I am trying to calculate the buried surface area of a loop which folds from
> a disordered to an ordered state.
>
> I am looking for a program that will allow me to calculate;
> (1) the buried surface area per atom (not residue)
> or
> (2) something that returns the buried surface area of apolar and polar
> atoms.
>
> It has to be per atoms and not per residues.
>
> Thanks again in advance.
>
>
> Dan
>


Re: [ccp4bb] Contact Surface Area

2009-09-25 Thread Daniel Bonsor
Sorry I should of made this clearer in my original post. Thanks anyway to
people who have responded thus far.

I am trying to calculate the buried surface area of a loop which folds from
a disordered to an ordered state. 

I am looking for a program that will allow me to calculate;
(1) the buried surface area per atom (not residue) 
or
(2) something that returns the buried surface area of apolar and polar atoms.

It has to be per atoms and not per residues. 

Thanks again in advance.


Dan


[ccp4bb] Contact Surface Area

2009-09-25 Thread Daniel Bonsor
I am trying to calculate the contact surface area of a loop. Using ArealMol
I only get the overall contact surface area per residue. Is there any way to
get it per atom or does anyone know of a program (online/software) which
will perform this task. 

Thanks in advance


Dan

Daniel A. Bonsor,
Boston Biomedical Research Institute,
64 Grove Street,
Watertown,
MA 02472 USA
Tel: +1 617.658.7845


Re: [ccp4bb] perfect twin test

2009-09-25 Thread Pavel Afonine

Hi Ben,


On 9/24/09 8:15 AM, Ben Flath wrote:

I will reprocess my data and try and use the UCLA anisotropy server (open to 
suggestions here).
  


once you solved and refined your structure, it would be great if you 
deposit to PDB your original data (not manipulated by, for example, 
anisotropy correction, etc.), along with the one that you used to obtain 
your best model and model-to-data statistics. This is just in case if 
the methodology improves or changes, the developers still have the 
access to the original problem, and not "corrected" one.


Pavel.


[ccp4bb] Practical Course in Biomolecular Modelling

2009-09-25 Thread Patrick Sticher

Dear colleagues,

please be informed that online applications are still accepted for the 
following course until October 16, 2009:


8TH NCCR PRACTICAL COURSE IN BIOMOLECULAR MODELLING

January 10 - 15, 2010
Kandersteg, Switzerland
http://www.structuralbiology.uzh.ch/course2010.asp

Course topics include
Simulation techniques, force-field development, conformational search, 
computation of free energy and entropy, treatment of electrostatic 
forces, simulation of folding, comparison of simulation with experiment


This course is primarily directed to PhD students and postdocs from 
experimental structural biology groups wishing to learn more on 
biomolecular modelling. The course format will include morning lectures 
and late-afternoon/early evening tutorials, and provide ample 
opportunities for discussions with experts and fellow participants. 
Participants will be invited to bring own problems for tutorials and/or 
discussion. The course will be organized as a winter retreat in the 
Swiss Alps offering a stimulating learning atmosphere with the 
afternoons available for informal participation in discussions, reading 
and self-study or recreational activities in the area.


Interested candidates are encouraged to apply online on 
http://www.structuralbiology.uzh.ch/course2010_application.asp. 
Application deadline will be October 16, 2010. We will be able to accept 
20 participants to this course.


Best regards,
Patrick Sticher

--
_
Visit the NCCR on the Internet
www.structuralbiology.uzh.ch

Dr. Patrick Sticher Moser
NCCR Scientific Officer
Institute of Biochemistry
University of Zürich
Winterthurerstrasse 190
CH - 8057 Zürich

Phone   +41 / (0)44 / 635 54 84
Fax +41 / (0)44 / 635 59 08
Mailstic...@bioc.uzh.ch


Re: [ccp4bb] Rfree in similar data set

2009-09-25 Thread Ian Tickle
Hi Lijun

One important point your summary didn't cover - the test set may not
actually be the same even though you think it is!  What I mean is that
the test sets for different crystals may have the same indices but may
not sample exactly the same points in reciprocal space either due to
cell parameter changes or to rigid-body movements, or in fact anything
which causes the crystals to be non-isomorphous.  Mike's original
question I think referred to switching datasets when the crystals are
chemically identical, but I suspect this problem arises more often when
a ligand is soaked in and you want to re-refine the complex using the
refined apo model as a starting point with the new data - the question
is does it matter whether or not you use the 'same' test set (i.e. same
indices)?

We frequently observe quite large cell parameter changes (up to 10% in
extreme cases) on soaking and freezing (which are hardly reproducible
treatments of the crystals!).  Let's say a cell parameter has changed by
only 2%, which is fairly typical for many ligand soaks.  The question is
at what resolution does this cause a test set reflection in the
protein-ligand data to become 'contaminated' by a reflection in the
working set of the apo data which differs by 1 in the index in the
direction of that cell parameter?  I think 'contamination' starts to
occur when the positions of the reflections in reciprocal space differ
by less than half a reciprocal cell length, i.e. the points in one
reciprocal lattice appear at points closer than the half-index positions
of the other lattice.  If the change is 2% in the 'a' parameter that
means the shift in the lattice is 1 in 50 rlu's, or at h=25 for 1/2 rlu.
Let's say the cell parameter a=100 Ang, so what resolution is h=25?  The
answer is 100/25 = 4 Ang, so that means all test set reflections with
resolution higher than 4 Ang are 'contaminated' by the working set of
the other crystal to a greater or lesser degree.  If the data go to 2
Ang, that's ~ 90% of the data - and that's from only a 2% change.

So it's just as well that refinement to convergence removes the
resulting Rfree bias, since I suspect very few people resort to
'shaking' (or whatever extreme measures are recommended) their
protein-ligand models before refinement!

Cheers

-- Ian

> -Original Message-
> From: Lijun Liu [mailto:li...@uoxray.uoregon.edu]
> Sent: 24 September 2009 19:00
> To: Ian Tickle
> Cc: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Rfree in similar data set
> 
> Sticking to the same test set is a great and practical idea!  It
lowers
> the chance to get biased from "self-validation".  However, logically,
> 
> 0) based on Bragg's law and HKL<->XYZ theory, no reflections are truly
> free from the others from the same crystal.  But when the reflection
> number is larger than the number of parameters minus other restriction
> conditions, you have more degree of freedom, statistically.
Uncertainties
> contributed from many aspects increase the freedom.   Even though,
free
> test set is not purely free.
> 
> 1) if the refined model is optimal/quasi-optimal, the model is then
> supposed to be consistent to the data used for test set too;  or the
model
> is far from optimal.  In this regard, switching test sets/using
different
> test sets will not be a problem for this kind of "ideal" case---enough
> refinement cycles should be able to bring consistent models.
> 
> 2) keeping the same reflections for test set will leave these
reflections
> lost any chance to contribute the minimization process in a direct
> fashion, which itself causes a kind of bias.  From the point of data,
if
> any data are excluded, no matter randomly or not, from calculation,
> artificial bias would be resulted!   If the initial model is biased,
it
> will be biased forever if it caused due to the exclusion of test set
(this
> sounds more true when with low resolution data).
> 
> So, a judgement may need be based on your data!  At the "end" (I mean
you
> are going to stop) of a smooth refinement, switching test sets should
not
> be a "huge" problem, or the model is too wrong!
> 
> Back to Mike's question:  I suggest you keep the same test set, since
your
> data were from the exactly same crystal.  At least it saves your
> convergence time.
> 
> Lijun Liu
> 
> On Sep 24, 2009, at 10:24 AM, Ian Tickle wrote:
> 
> 
>   -Original Message-
> 
> 
>   From: Dale Tronrud [mailto:det...@uoxray.uoregon.edu]
> 
> 
>   Sent: 24 September 2009 17:21
> 
> 
>   To: Ian Tickle
> 
> 
>   Cc: CCP4BB@JISCMAIL.AC.UK
> 
> 
>   Subject: Re: [ccp4bb] Rfree in similar data set
> 
> 
> 
> While I agree with Ian on the theoretical level, in
practice
> 
> 
>   people use free R's to make decisions before the
ultimate
> model
> 
> 
>   is finished, and our refinement programs are still
limited in
> 
> 
>   their abilities to find even a

[ccp4bb] statistics

2009-09-25 Thread Matthias Zebisch
Dear everybody!

I recently tried out the anisotropy server at 
http://www.doe-mbi.ucla.edu/~sawaya/anisoscale/

I also see that there is much discussion going on about the correctness of this 
method. In any case, does anyone know a tool that gives me the 
datacollection statistics that I normally record from the SCALA log file?
I am talking about multiplicity, completeness, I/sI etc all _AFTER_ ellipsoidal 
truncation and anisotropic scaling.

My crystal diffracts in one direction to at least 1.8A whereas in the other 
directions it is rather 2.0 to 2.1. The spacegroup is C2.

Thank you a lot,

Matthias


Re: [ccp4bb] Alignment of Electron Density Map for structures that have different space groups

2009-09-25 Thread Matthias Zebisch
Dear Milya!

I had this problem before to. This was showing active site ligands from the 
same view (i.e. after superposition of totally diffeent crystals). 
You can use the superposition matrix fom your PDB superposition and apply it 
using the program MAPROT in the "INVERT" mode and using "SYMMETRY P1" 
as spacegroup to your map.

Matthias

PS: Here is the answer from Ingo I got a that time:


in maprot, you need to always give symmetry P1. if it is between two
very similar cells you will not notice that the density is a little bit
offset, but it starts if they are non-isomorphous crystals, and even
worse so, if different space groups.

cannibalize the following script (which is part of our automagic
pipeline, i will leave fixing up the details, variables and file names
to you), if you like.

cheers

ingo


#
# rotating maps onto reference structure 


#
#
#   ... calculate 2fofc and fofc map suitable for rotating
#
fft \
HKLIN $finalmtz \
MAPOUT $TMPDIR/mapout.fft_final2fofc.map.$run << eof-fft >>
cxap_deposit.out.$run
title 2FO-1FC
labin -
  F1 = FWT -
  PHI = PHWT
xyzlim 0.0 1.0 0.0 1.0 0.0 1.0
end
eof-fft
#
fft \
HKLIN $finalmtz \
MAPOUT $TMPDIR/mapout.fft_finalfofc.map.$run << eof-fft >>
cxap_deposit.out.$run
title 1FO-1FC
labin -
  F1 = DELFWT -
  PHI = PHDELWT
xyzlim 0.0 1.0 0.0 1.0 0.0 1.0
end
eof-fft
#
#
#
ncsmask \
XYZIN $referencepdb \
MSKOUT $TMPDIR/mapout.mapmask_referencepdb.map.$run << eof-
ncsmask >>
cxap_deposit.out.$run
radius 5.0
eof-ncsmask
#
xlim=`egrep -A1 "After trim:" cxap_deposit.out.$run | tail -1 | awk
'{printf("%5i%5i",$2,$4)}'`
ylim=`egrep -A2 "After trim:" cxap_deposit.out.$run | tail -1 | awk
'{printf("%5i%5i",$2,$4)}'`
zlim=`egrep -A3 "After trim:" cxap_deposit.out.$run | tail -1 | awk
'{printf("%5i%5i",$2,$4)}'`
gridwork=`grep "Grid sampling" cxap_deposit.out.$run | tail -1 | awk
'{print $8, $9, $10}'`
cellwork=`grep  "Cell dimensions" cxap_deposit.out.$run | tail -1 | awk
'{print $4, $5, $6, $7, $8, $9}'`
#
#   ... the operator obtained above has to be inverted, since we do need
the 'fetching' operator
#   here, not the 'putting' one
#
#   ... symmetry of the resulting map must be P1, since, of course,
nothing at all would square
#   out, if you move the molecule but maintain the original cell
#
maprot \
MAPIN $TMPDIR/mapout.fft_final2fofc.map.$run \
MSKIN $TMPDIR/mapout.mapmask_referencepdb.map.$run \
WRKOUT $rot2fofc << eof-maprot >> cxap_deposit.out.$run
mode from
symmetry P1
average
rota matrix $row1 $row2 $row3
trans $row4
invert
eof-maprot
#
#
maprot \
MAPIN $TMPDIR/mapout.fft_finalfofc.map.$run \
MSKIN $TMPDIR/mapout.mapmask_referencepdb.map.$run \
WRKOUT $rotfofc << eof-maprot >> cxap_deposit.out.$run
mode from
symmetry P1
average
rota matrix $row1 $row2 $row3
trans $row4
invert
eof-maprot
#


Re: [ccp4bb] Rfree in similar data set

2009-09-25 Thread Ian Tickle
Hi Ed,

I already did essentially that many years ago for the Rfree papers in
Acta (sorry I don't have the exactly the data I used any more since
those exact datasets weren't deposited, but it would not be too hard for
anyone to reconstruct the experiment along the lines you suggest).  The
conclusion was that it made no difference, the Rfree is the same (or at
least insignificantly different) *provided* that in doing your shaking
you haven't shaken it into a different local optimum (usually with a
worse likelihood) - then obviously you don't expect to get the same
answer.  To me this conclusion is hardly earth-shattering - if you
refine two different models using the same data to the same optimum, you
must get the same Rfree, since Rfree (and all other refinement
statistics) depend only on the current model parameters and not on
anything you did previously.

Cheers

-- Ian

> -Original Message-
> From: owner-ccp...@jiscmail.ac.uk [mailto:owner-ccp...@jiscmail.ac.uk]
On
> Behalf Of Edward A. Berry
> Sent: 24 September 2009 20:53
> To: owner-ccp...@jiscmail.ac.uk
> Cc: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Rfree in similar data set
> 
> This issue has come up from time to time, and I don't think
> anyone has been convinced to change their mind by the
> theoretical discussions.
> But isn't this amenable to test experimentally?
> Given the ready availability of CPU time . . .
> 
> Take a particular structure, preferably a deposited PDB structure
> so that it is a fixed starting point available to everyone.
> Select a refinement strategy (which will determine radius of
> convergence?).
> 
> 1. Refine the structure to convergence with the original free set.
> R-free may be different from original depositors because of different
> strategy, refinement target, and bulk solvent model, so this will
> be the reference to compare subsequent results with.
> 
> 2. Take that structure (1), select a new R-free set, again refine
> to convergence. R-free will be different, but should not be
significantly
> different, from that in 1. Might need to refine against 5 or 10
different
> R-free choices to see what is significant.
> 
> 3. Take the structure from (2), refine using the original free set.
> According to one school, R-free is hopelessly corrupted and will never
> rise to the original level (1) unless drastic refinement steps are
> taken to "shake out" the bias.
> The other school would predict R-free to converge on the same value
(1),
> provided drastic steps are *not* taken, as they might allow the
refinement
> to jump into a different local maximum.
> 
> Then measure RMS- and maximum- atomic deviations between the models
> and see if there are any differences that a PDB user would care about.
> 
> This does not directly address the original poster's question, in
which
> a new set of data is being used. However I think we would agree that
> if there is no bias when exactly the same data is being used, there
> would be none in the case of different data.
> 
> Even if the R-free is biased by this test, it may not be in the
> case of a new dataset- the "noise" which is being "overfit" in the
> new dataset could be completely independent from that in the old.
> However I think it is generally agreed that there is a component
> of "noise" (in the most general sense, meaning the difference
> between what is calculated from our best model and what is observed)
> which is common between different crystals.
> 
> Ed
> 
> Ian Tickle wrote:
> >> -Original Message-
> >> From: Dale Tronrud [mailto:det...@uoxray.uoregon.edu]
> >> Sent: 24 September 2009 17:21
> >> To: Ian Tickle
> >> Cc: CCP4BB@JISCMAIL.AC.UK
> >> Subject: Re: [ccp4bb] Rfree in similar data set
> >
> >>While I agree with Ian on the theoretical level, in practice
> >> people use free R's to make decisions before the ultimate model
> >> is finished, and our refinement programs are still limited in
> >> their abilities to find even a local minimum.
> >
> > I wasn't saying that Rfree is only useful for the ultimate finished
> > model.  My argument also applies to all intermediate models; the
> > criterion is that the refinement has converged against the current
> > working set, even if it is only an incomplete model, or if it is
only to
> > a local optimum.  So it's perfectly possible to use Rfree for
> > overfitting & other tests on intermediate models.  The point is that
it
> > doesn't matter how you arrived at that optimum (whether local or
> > global), Rfree is a function only of the parameters at that point,
not
> > of any previous history.  If you arrived at that same local or
global
> > optimum via a path which didn't involve switching datasets midway,
you
> > must get the same answer for Rfree, so I just don't see how it can
be
> > biased one way and not biased the other.  Note that this is meant as
a
> > 'thought experiment', I'm not saying necessarily that it's possible
to
> > perform this experiment in practice!
> >
> >>On the 

Re: [ccp4bb] Rfree in similar data set

2009-09-25 Thread Ian Tickle
Hi Tom

Attainment of the global optimum is not a necessary condition for the
argument to hold, it was merely an example, but I agree with you that
maybe it wasn't such a good example from a practical point of view! -
but it was intended only as a hypothetical example to illustrate the
point I was making.  This was that the same optimum with the same value
of Rfree can be reached by many different paths some of which might
involve switching the test set midway (i.e. the ones claimed to be
biased), and some where the same test set is used throughout (i.e. the
ones we're all agreed are unbiased); obviously in each case the final
refinement must use the same test set for any comparison of the Rfree's
to be valid.  However it's a logical impossibility (i.e. in essence it
comes down to a reductio ad absurdum to the equation '0=1') for the same
Rfree at the same optimum to be both biased and unbiased (bias of course
being the difference between the expectation and the true value).  The
*only* necessary (and sufficient) condition is that the refinement with
the new data has converged, whether it's to a global or local optimum
makes no essential difference, so that the Rfree for the parameters at
that optimum is meaningful and any previous bias is removed.

Note that bias in Rfree arises because the model parameters are
unavoidably overfitted to the 'noise' in the data (i.e. random
experimental errors in Iobs or Fobs), whereas what we want is to fit the
parameters to only the 'signal' in the data (i.e. differences between
Fobs and Fcalc which relate only to real differences in the model).
Unfortunately optimization algorithms are unable to make any distinction
between fitting signal and noise, so of course we end up fitting both.
When we fit the model to a new set of data, the parameters are re-fitted
to the signal and noise in the new data, and any 'memory' of fitting to
the old data, along with any bias in Rfree due to fitting the noise in
the old data, is completely replaced at convergence by the 'memory' of
fitting to the new data.

Cheers

-- Ian

> -Original Message-
> From: owner-ccp...@jiscmail.ac.uk [mailto:owner-ccp...@jiscmail.ac.uk]
On
> Behalf Of Tom Terwilliger
> Sent: 24 September 2009 16:58
> To: Ian Tickle
> Cc: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Rfree in similar data set
> 
> Hi Ian,
> 
> Surely you are correct that  "...once all issues of local optima are
> resolved, by whatever means it takes, you will end up at the same
unique
> global optimum no matter where you started from."   However the key
here
> is "by whatever means it takes".  I think that in practice there are a
> vast number of local minima in this problem.  You can rebuild a model
from
> the PDB that is highly refined and find many other models that have R-
> factors that are the same or better, and all can be refined to a
stable
> "minimum".  All of course are very similar and differ principally in
side-
> chain conformations and small main chain differences.   I think that
means
> it is very difficult to find the global minimum.
> 
> In practice, relative to the Rfree set discussion that started this, I
> think this also means that once an Rfree set is chosen and a model has
> been refined using that Rfree set, the Rfree set should be kept.
> 
> All the best,
> Tom T
> 
> On Sep 24, 2009, at 9:41 AM, Ian Tickle wrote:
> 
> 
>   -Original Message-
> 
> 
>   From: owner-ccp...@jiscmail.ac.uk [mailto:owner-
> ccp...@jiscmail.ac.uk]
> 
> 
>   On
> 
> 
>   Behalf Of Eric Bennett
> 
> 
>   Sent: 24 September 2009 13:31
> 
> 
>   To: CCP4BB@JISCMAIL.AC.UK
> 
> 
>   Subject: Re: [ccp4bb] Rfree in similar data set
> 
> 
> 
>   Ian Tickle wrote:
> 
> 
> 
>   For that to
> 
> 
>   be true it would have to be possible to arrive
at a
> different
> 
> 
>   unbiased
> 
> 
>   Rfree from another starting point.  But provided
your
> starting point
> 
> 
>   wasn't a local maximum LL and you haven't gotten
into a
> local maximum
> 
> 
>   along the way, convergence will be to a unique
global
> maximum of the
> 
> 
>   LL,
> 
> 
>   so the Rfree must be the same whatever starting
point is
> used (within
> 
> 
>   the radius of convergence of course).
> 
> 
> 
>   But if you're using a different set of data the minima
and
> maxima of
> 
> 
>   the function aren't necessarily going to be in the same
place.
> Rfree
> 
> 
>   is supposed to inform about overfitting.  In an
overfitting
> situation
> 
> 
>   there are multiple possible models which describe the
data
> well and
> 
> 
>   which overfit solution you end up with could be
sensitive to
> the data
> 
> 
>   set used.  The provisions that you haven't gotten stuck
in