from:"\"Ed Pozharski\""


On 08/07/2013 05:54 PM, Nat Echols wrote:
Personally, if I need to change a chain ID, I can use Coot or pdbset 
or many other tools.  Writing code for this should only be necessary 
if you're processing large numbers of models, or have a spectacularly 
misformatted PDB file.  Again, I'll repeat what I said before: if it's 
truly necessary to view or edit a model by hand or with custom shell 
scripts, this often means that the available software is deficient.  
PLEASE tell the developers what you need to get your job done; we 
can't read minds.


Nat,

I don't think anyone here really means that the only way to change a 
chain ID is to write, say, a perl script.  But an interpreter of the 
kind advocated by James (as much as I have hijacked/misinterpreted his 
vision) could indeed be very useful for people pursuing simple 
bioinformatics projects and new ways to analyse structural models. While 
I understand your view that everyone should seek assistance from 
"developers" with every problem encountered, I also recall some 
reasonable idea about self-sufficiency that should cover scientific 
research (something like "give man a fish and you feed him for a day, 
teach him to fish and he starts paying taxes"... something along these 
lines ;).  There is a difference betweens tools that allow to easily 
perform useful non-standard analysis and highly specialized tools that 
strive to cover every situation imaginable.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] mmCIF as working format?


James,

On 08/07/2013 05:36 PM, James Stroud wrote:
Anyone can learn Python in an hour and a half. 


Isn't this a bit of an exaggeration?  Python is designed to be easy to 
learn, but we probably talking about different definitions of "learning" 
and "anyone".



I.e. programs would look like this

---
GRAB protein FROM FILE "best_model_ever.cif";
SELECT CHAIN A FROM protein AS chA;
SET chA BFACTORS TO 30.0;
GRAB data FROM FILE "best_data_ever.cif";
BIND protein TO data;
REFINE protein USING BUSTER WITH TLS+ANISO;
DROP protein INTO FILE "better_model_yet.cif";
---

Not necessarily a bad idea but now through the fog of time I remember something oddly 
reminiscent... ah, CNS! (for those googling for it it's not the "central nervous 
system" :).
Although a little too much like natural language, it is not a bad idea. But, 
where is the link describing the layer of CNS that looks like that?


I should probably use  markup next 
time to prevent my poor attempt at humorous tribute to CNS from being 
understood so literally.  At the very least you might agree that CNS is 
the closest thing we ever had to MX-oriented general purpose 
interpreter.  Your quote is also from 
"below-the-magic-line-do-not-change" area of a CNS script.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] mmCIF as working format?


On 08/07/2013 03:54 PM, James Stroud wrote:

On Aug 7, 2013, at 1:06 PM, Ed Pozharski wrote:

On 08/07/2013 01:51 PM, James Stroud wrote:

In the long term, the MM structure community should perhaps get its inspiration 
from SQL

For this to work, a particular interface must monopolize access to structural 
data.

Not necessarily, although the alternative pathway might be more idealistic and 
hence unrealistic.

All that needs to happen is that the community agree on

1. What is the finite set of essential/useful attributes of macromolecular 
structural data.
2. What is the syntax of (a) accessing and (b) modifying those attributes.
3. What is the syntax of selecting subsets of structural data based on those 
attributes.

The resulting syntax (i.e. language) itself should be terse, easy to learn, 
easy to use, and preferably easy to implement.

If such a standard is created, then I believe awk-ing/grep-ing/sed-ing/etc PDBs 
and mmCIFs would quickly become historical.

James

James,

frankly, I am not sure which part of your description is not equivalent 
to "monopolistic interface".


If I understand your proposal and reference to SQL correctly, you want 
some scripting language that sounds like simple English.  Is the 
advantage over existing APIs here that one does not need to learn 
Python, C++, (or, heaven forbid, FORTRAN)?  I.e. programs would look 
like this


---
GRAB protein FROM FILE "best_model_ever.cif";
SELECT CHAIN A FROM protein AS chA;
SET chA BFACTORS TO 30.0;
GRAB data FROM FILE "best_data_ever.cif";
BIND protein TO data;
REFINE protein USING BUSTER WITH TLS+ANISO;
DROP protein INTO FILE "better_model_yet.cif";
---

Not necessarily a bad idea but now through the fog of time I remember 
something oddly reminiscent... ah, CNS! (for those googling for it it's 
not the "central nervous system" :).


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] mmCIF as working format?


On 08/07/2013 01:51 PM, James Stroud wrote:

In the long term, the MM structure community should perhaps get its inspiration 
from SQL
For this to work, a particular interface must monopolize access to 
structural data.  Then maintainers of that victorious interface could 
change the underlying format whichever way they want while supplying the 
never ending stream of useful features.  And all other programs would be 
just frontends to the interface.  As long as data format remains easily 
readable and there is more than one person willing to fiddle with code, 
persistence or at the very least backward compatibility of the data 
format will remain a (minor to me) issue.  It is also important that it 
is much easier to write a pdb parser in your favourite language than to 
implement general purpose relational database management system.


For full disclosure, I personally do not share the apocalyptic feeling 
about transition to mmCIF.


Cheers,

Ed.


--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] Problems with SANS data analysis

This question may be better suited for more small-angle-oriented forum, 
e.g.

http://www.saxier.org/forum/


On 08/07/2013 11:22 AM, Remec, Mark wrote:


Dear CCP4bb,

I have a few questions concerning SANS data recently collected that 
I'm having trouble analyzing. The data was collected at 2 different 
detector distances (4m, 2.5m) to achieve higher q-range, but I worry 
that the curves don't overlap enough at intermediate q, which might 
indicate a problem with the data. The links below are pictures of the 
corresponding datasets, before truncating the 4m high-q data and 
merging them into one. Is there a problem evident with the data, or am 
I imagining a problem?


http://postimg.org/image/qb00y20qr/

http://postimg.org/image/8trbp7akj/

http://postimg.org/image/hni86axj7/

http://postimg.org/image/3sjxnu343/

http://postimg.org/image/4ysj0dgsj/

http://postimg.org/image/9ypz8bmf7/

http://postimg.org/image/m358pazb7/

http://postimg.org/image/jzuthmzib/

My second question concerns the values obtained in the analysis of the 
final scattering curves. The second sample in my experiment shows 
serious deviation in the values obtained for I(0) and Rg by Guinier 
analysis compared to the values obtained by the P(r) analysis. In 
other words, either the P(r) values match the Guinier and the P(r) fit 
is terrible, or else the P(r) fit is good but doesn't match the 
Guinier at all (5-10 difference in Rg, 2x difference in I(0)). I've 
checked to make sure the buffer subtraction algorithm was OK, and I'm 
pretty certain that the buffers were exact matches, so I don't know 
how to explain this variation. There's no evidence of aggregation or 
polydispersity to throw off the values, either. Does anyone know how 
this can happen?






--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] ctruncate bug?

Douglas,

Observed intensities are the best estimates that we can come up with in an
experiment.

I also agree with this, and this is the clincher. You are arguing that Ispot-Iback=Iobs is
the best estimate we can come up with. I claim that is absurd. How are you quantifying
"best"? Usually we have some sort of discrepancy measure between true and
estimate, like RMSD, mean absolute distance, log distance, or somesuch. Here is the
important point --- by any measure of discrepancy you care to use, the person who estimates
Iobs as 0 when Iback>Ispot will *always*, in *every case*, beat the person who estimates
Iobs with a negative value. This is an indisputable fact.

First off, you may find it useful to avoid such words as absurd and
indisputable fact. I know political correctness may be sometimes
overrated, but if you actually plan to have meaningful discussion, let's
assume that everyone responding to your posts is just trying to help
figure this out.

To address your point, you are right that J=0 is closer to "true
intensity" then a negative value. The problem is that we are not after
a single intensity, but rather all of them, as they all contribute to
electron density reconstruction. If you replace negative Iobs with
E(J), you would systematically inflate the averages, which may turn
problematic in some cases. It is probably better to stick with "raw
intensities" and construct theoretical predictions properly to account
for their properties.

What I was trying to tell you is that observed intensities is what we
get from experiment. They may be negative, and there is nothing
unphysical about it. Then you build a theoretical estimate of observed
intensities, and if you do it right (i.e. by including experimental
errors), they will actually have some probability of being negative.

This background has to be subtracted and what is perhaps the most useful form
of observation is Ispot-Iback=Iobs.

How can that be the most useful form, when 0 is always a better estimate than a
negative value, by any criterion?

Given your propensity to refer to what others might say as absurd, I am
tempted to encourage *you* to come up with a better estimate.
Nevertheless, let me try to clarify my point.

What is measured in the experiment is Ispot. It contains Iback which
our theoretical models cannot possibly account for (because we have no
information at the refinement stage about crystal shape and other
parameters that define background). Strategy that has been in use for
decades is to obtain estimates of Iback from pixels surrounding the
integration spot. I hope you find that reasonable.

Once we have Iback estimated, Ispot-Iback becomes Iobs - observed
intensity. There is no need to convert that value simply to avoid bad
feeling brought by negative values. Correctly formulated theoretical
model predicts Iobs and accounts for error in it.

Let me state this again - Iobs are not true intensities and not
estimates of true intensities. They are experimental values sampling
Ispot-Iback. These can be negative. If a theoretical model that
approximates Iobs does not allow for negative Iobs, the model is flawed.

These observed intensities can be negative because while their true underlying
value is positive, random errorsmay result in Iback>Ispot. There is absolutely
nothing unphysical here.

Yes there is. The only way you can get a negative estimate is to make
unphysical assumptions. Namely, the estimate Ispot-Iback=Iobs assumes that
both the true value of I and the background noise come from a Gaussian
distribution that is allowed to have negative values. Both of those
assumptions are unphysical.

See, I have a problem with this. Both common sense and laws of physics
dictate that number of photons hitting spot on a detector is a positive
number. There is no law of physics that dictates that under no
circumstances there could be Ispot=E(Iback).
Yes, E(Ispot-Iback)>=0. But P(Ispot-Iback=0)>0, and therefore
experimental sampling of Ispot-Iback is bound to occasionally produce
negative values. What law of physics is broken when for a given
reflection total number of photons in spot pixels is less that total
number of photons in equal number of pixels in the surrounding
background mask?

Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] ctruncate bug?


On 06/21/2013 10:19 AM, Ian Tickle wrote:
If you observe the symptoms of translational NCS in the diffraction 
pattern (i.e. systematically weak zones of reflections) you must take 
it into account when calculating the averages, i.e. if you do it 
properly parity groups should be normalised separately (though I 
concede there may be a practical issue in that I'm not aware of any 
software that currently has this feature). 


Ian,

I think this is exactly what I was trying to emphasize, that applying 
some conversion to raw intensities may have negative impact when 
conversion is based on incorrect or incomplete assumptions.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] str solving problem


On 06/20/2013 06:20 AM, Eleanor Dodson wrote:
But you say you took the Balbes model into phaser? and I think Balbes 
automatically runs cycles of refinement so any comment on R factors 
may not mean much.


I have seen a model coming out of Balbes pipeline hitting extremely high 
marks when fed into Phaser while being complete nonsense (it's a 150kDa 
multi-domain protein and resulting domain arrangement made absolutely no 
sense).  Refinement was stuck with high R-values and I sadly gave up on 
it for now.  I suspected that refmac step included in the pipeline 
artificially shifts the model so that it conforms to Patterson map 
better, which results in high score in Phaser. However, when I tried to 
reproduce this phenomenon by force-refining some random 12kDa protein 
against lysozyme data and then feeding the result into Phaser, it did 
not produce any high scores.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] ctruncate bug?


On 06/20/2013 01:07 PM, Douglas Theobald wrote:

How can there be nothing "wrong" with something that is unphysical?  
Intensities cannot be negative.


I think you are confusing two things - the true intensities and observed 
intensities.


True intensities represent the number of photons that diffract off a 
crystal in a specific direction or, for QED-minded, relative 
probabilities of a single photon being found in a particular area of the 
detector when it's probability wave function finally collapses.


True intensities certainly cannot be negative and in crystallographic 
method they never are. They are represented by the best theoretical 
estimates possible, Icalc.  These are always positive.


Observed intensities are the best estimates that we can come up with in 
an experiment.  These are determined by integrating pixels around the 
spot where particular reflection is expected to hit the detector.  
Unfortunately, science did not yet invent a method that would allow to 
suspend a crystal in vacuum while also removing all of the outside 
solvent.  Neither we have included diffuse scatter in our theoretical 
model.  Because of that, full reflection intensity contains background 
signal in addition to the Icalc.  This background has to be subtracted 
and what is perhaps the most useful form of observation is Ispot-Iback=Iobs.


These observed intensities can be negative because while their true 
underlying value is positive, random errors may result in Iback>Ispot.  
There is absolutely nothing unphysical here. Replacing Iobs with E(J) is 
not only unnecessary, it's ill-advised as it will distort intensity 
statistics.  For example, let's say you have translational NCS aligned 
with crystallographic axes, and hence some set of reflections is 
systematically absent.  If all is well, ~0 for the subset while 
 is systematically positive.  This obviously happens because the 
standard Wilson prior is wrong for these reflections, but I digress, as 
usual.


In summary, there is indeed nothing wrong, imho, with negative Iobs.  
The fact that some of these may become negative is correctly accounted 
for once sigI is factored into the ML target.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] ctruncate bug?

2013-06-19 Thread Ed Pozharski

Dear Kay and Jeff,

frankly, I do not see much justification for any rejection based on
h-cutoff.  

French&Wilson only talk about I/sigI cutoff, which also warrants further
scrutiny.  It probably could be argued that reflections with I/sigI<-4
are still more likely to be weak than strong so F~0 seems to make more
sense than rejection.  The nature of these outliers should probably be
resolved at the integration stage, but these really aren't that
numerous.

As for h>-4 requirement, I don't see French&Wilson even arguing for that
anywhere in the paper.  h variable does not reflect any physical
quantity that would come with prior expectation of being non-negative
and while the posterior of the true intensity (for acentric reflections)
is distributed according to the truncated normal distribution N(sigma*h,
sigma^2), I don't really see why h<-4 is "bad".

>From what I understand, Kay has removed h-cutoff from XDSCONV (or never
included it in the first place).  Perhaps ctruncate/phenix should change
too?  Or am I misunderstanding something and there is some rationale for
h<-4 cutoff?

Cheers,

Ed.

On Wed, 2013-06-19 at 06:47 +0100, Kay Diederichs wrote:
> Hi Jeff,
> 
> what I did in XDSCONV is to mitigate the numerical difficulties associated 
> with low h (called "Score" in XDSCONV output) values, and I removed the h < 
> -4 cutoff. The more negative h becomes, the closer to zero is the resulting 
> amplitude, so not applying a h cutoff makes sense (to me, anyway).
> XDSCONV still applies the I < -3*sigma cutoff, by default.
> 
> thanks,
> 
> Kay

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs

Re: [ccp4bb] ctruncate bug?

2013-06-17 Thread Ed Pozharski


Jeff,

thanks - I can see the same equation and cutoff applied in ctruncate 
source.Here is the relevant part of the code


// Bayesian statistics tells us to modify I/sigma by 
subtracting off sigma/S

// where S is the mean intensity in the resolution shell
h = I/sigma - sigma/S;
// reject as unphysical reflections for which I < -3.7 sigma, 
or h < -4.0

if (I/sigma < -3.7 || h < -4.0 ) {
nrej++;
if (debug) printf("unphys: %f %f %f %f\n",I,sigma,S,h);
return(0);
}


This seems to be arbitrary cutoff choice, given that they are 
hard-coded.  At the very least, cutoffs should depend on the total 
number of reflections to represent famyliwise error rates.


It is however the h-based rejection that seems most problematic to me.  
In the dataset in question, up to 20% reflections are rejected in the 
highest resolution shell (granted, I/sigI there is 0.33).  I would 
expect that reflections are rejected when they are deemed to be outliers 
due to reasons other than statistical errors (e.g. streaks, secondary 
lattice spots in the background, etc).  I must say that this was done 
with extremely good quality data, so I doubt that 1 out of 5 reflections 
returns some physically impossible measurement.


What is happening is that =3S in the highest resolution shell, 
and for many reflection h<-4.0.  This does not mean that reflections are 
"unphysical" though, just that shell as a whole has mostly weak data (in 
this case 89% with I/sigI<2 and 73% with I/sigI<1).


What is counterintuitive is why do I have to discard reflections that 
are just plain weak, and not really outliers?


Cheers,

Ed.



On 06/17/2013 10:29 PM, Jeff Headd wrote:

Hi Ed,

I'm not directly familiar with the ctruncate implementation of French 
and Wilson, but from the implementation that I put into Phenix (based 
on the original F&W paper) I can tell you that any reflection where 
(I/sigI) - (sigI/mean_intensity) is less than a defined cutoff (in our 
case -4.0), then it is rejected. Depending on sigI and the mean 
intensity for a given shell, this can result in positive intensities 
that are also rejected. Typically this will effect very small positive 
intensities as you've observed.


I don't recall the mathematical justification for this and don't have 
a copy of F&W here at home, but I can have a look in the morning when 
I get into the lab and let you know.


Jeff


On Mon, Jun 17, 2013 at 5:04 PM, Ed Pozharski <mailto:epozh...@umaryland.edu>> wrote:


I noticed something strange when processing a dataset with
imosflm.  The
final output ctruncate_etc.mtz, contains IMEAN and F columns, which
should be the conversion according to French&Wilson.  Problem is that
IMEAN has no missing values (100% complete) while F has about 1500
missing (~97% complete)!

About half of the reflections that go missing are negative, but
half are
positive.  About 5x more negative intensities are successfully
converted.  Most impacted are high resolution shells with weak signal,
so I am sure impact on "normal" refinement would be minimal.

However, I am just puzzled why would ctruncate reject positive
intensities (or negative for that matter - I don't see any cutoff
described in the manual and the lowest I/sigI for successfully
converted
reflection is -18).

Is this a bug or feature?

Cheers,

Ed.

--
I don't know why the sacrifice thing didn't work.
Science behind it seemed so solid.
Julian, King of Lemurs





--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

[ccp4bb] ctruncate bug?

2013-06-17 Thread Ed Pozharski

I noticed something strange when processing a dataset with imosflm.  The
final output ctruncate_etc.mtz, contains IMEAN and F columns, which
should be the conversion according to French&Wilson.  Problem is that
IMEAN has no missing values (100% complete) while F has about 1500
missing (~97% complete)!

About half of the reflections that go missing are negative, but half are
positive.  About 5x more negative intensities are successfully
converted.  Most impacted are high resolution shells with weak signal,
so I am sure impact on "normal" refinement would be minimal.

However, I am just puzzled why would ctruncate reject positive
intensities (or negative for that matter - I don't see any cutoff
described in the manual and the lowest I/sigI for successfully converted
reflection is -18).

Is this a bug or feature?

Cheers,

Ed.

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs

Re: [ccp4bb] sigma value in structure file

2013-06-17 Thread Ed Pozharski

The only thing that seems to make sense is bonds rmsd - but you should
ask the annotator for specifics directly.  If it is bonds rmsd, this has
been discussed many times - just google "rmsd bonds ccp4bb" and look for
most recent entries.

On Mon, 2013-06-17 at 12:11 +0530, Faisal Tarique wrote:
> Dear all
> 
> 
> During PDB deposition the annotator me to verify and review the sigma
> value in the structure file which in my case was -0.03. I have some
> basic queries and request you all to please answer them. My first
> question is, what actually is a "sigma value" of a structure file.
> 2nd) where the value is mentioned?. 3rd) and What is the optimum sigma
> value range ?
> 
> 
> -- 
> Regards
> 
> Faisal
> School of Life Sciences
> JNU

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] Concerns about statistics

2013-06-15 Thread Ed Pozharski


On 06/14/2013 07:00 AM, John R Helliwell wrote:
Alternatively, at poorer resolutions than that, you can monitor if the 
Cruickshank-Blow Diffraction Precision Index (DPI) improves or not as 
more data are steadily added to your model refinements.

Dear John,

unfortunately the behavior of DPIfree is less than satisfactory here - 
in a couple of cases I looked at it just steadily improves with 
resolution.  Example I have in front of me right now takes resolution 
down from 2.0A to 1.55A, and DPIfree goes down from ~0.17A to 0.09A at 
almost constant pace (slows down from 0.021 A/0.1A to 0.017 A/0.1A 
around 1.75A).


Notice that in this specific case I/sigI at 1.55A is ~0.4 and 
CC(1/2)~0.012 (even this non-repentant big-endian couldn't argue there 
is good signal there).


DPIfree is essentially proportional to Rfree * d^(2.5)  (this is 
assuming that No~1/d^3, Na and completeness do not change).  To keep up 
with resolution changes, Rfree would have to go up ~1.9 times, and 
obviously that is not going to happen no matter how much weak data I 
throw in.


The maximum-likelihood e.s.u. reported by Refmac makes more sense in 
this particular case as it clearly slows down big time around 1.77A (see 
https://plus.google.com/photos/113111298819619451614/albums/5889708830403779217). 
Coincidentally, Rfree also starts going up rapidly around the same 
resolution.  If anyone is curious what's I/sigI is at the "breaking 
point" it's ~1.5 and CC(1/2)~0.6.  And to bash Rmerge a little more, 
it's 112%.


So there are two questions I am very much interested in here.

a) Why is DPIfree so bad at this?  Can we even believe it given it's 
erratic behavior in this scenario?


b) I would normally set up a simple data mining project to see how 
common this ML_esu behavior is, but there is no easily accessible source 
of data processed to beyond I/sigI=2, let alone I/sigI=1 (are structural 
genomics folks reading this and do they maybe have such data to mine?).  
I can look into all of my own datasets, but that would be a biased 
selection of several crystal forms.  Perhaps others have looked into 
this too, and what are your observations? Or maybe you have a dataset 
processed way beyond I/sigI=1 and are willing to either share it with me 
together with a final model or run refinement at a bunch of different 
resolutions and report the result (I can provide bash scripts as needed).


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Ed Pozharski

Tim,

my personal preference always was I/sigI=1.  In my Scalepack days, I
always noticed that ~30% of the reflections in the I/sigI=1 shells had
I/sigI>2, and formed an unverified belief that there should be some
information there.

In my experience, CC1/2=0.5 would normally yield I/sigI~1, not 2.  This
is based predominantly on Scala/Aimless.

Cheers,

Ed.

On Thu, 2013-06-13 at 18:20 +0200, Tim Gruene wrote:
> 
> On 06/13/2013 06:16 PM, Ed Pozharski wrote:
> > [...] With that said, I am pretty sure that in vast majority of
> > cases structural conclusions derived with I/s=2 vs CC1/2=0.5 vs
> > DR=0 cutoff will be essentially the same.
> 
> Hi Ed,
> in my experience, CC(1/2) > 0.7 corresponds quite well to I/sigI > 2.0
> rather than CC(1/2) > 0.5 (again, with the default resolution shells
> from xprep that also plots CC(1/2) vs. resolution. Are above numbers
> based on experience, too? If so, which program do you usually use to
> look at these statistics?
> 
> Tim
> 
> 

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs

Re: [ccp4bb] Concerns about statistics

2013-06-13 Thread Ed Pozharski

On Thu, 2013-06-13 at 08:44 -0700, Andrea Edwards wrote:
> In this case, the author should report a correlation coefficient along
> with the other standard statistics (I/sigI, Rmerg, Completeness,
> redundancy, ect.)? 

Won't hurt.  

> What about Rpim instead of Rmerg? and if Rpim is reported, what should
> be the criteria for resolution cutoff?

Rmerge is known to be deeply flawed for ~15 years.  IMHO, it shall not
be reported at all.  While Rpim is better, the whole point of
Karplus&Diederichs is that R-type measures are not very useful in
deciding resolution cutoff.

> Also, if this paper is the "new standard" how should we regard
> statistic reported in the literature? 

We should keep in mind that conservative resolution cutoff criteria has
been used in the field for decades. 

> Or.. more importantly, how do we go about reviewing current literature
> that does not report this statistic?

Structures refined up to I/sigma=2 should be considered likely to have
been refined to resolution that was cut off too low.

With that said, I am pretty sure that in vast majority of cases
structural conclusions derived with I/s=2 vs CC1/2=0.5 vs DR=0 cutoff
will be essentially the same.
> 

-- 
"I'd jump in myself, if I weren't so good at whistling."
   Julian, King of Lemurs

Re: [ccp4bb] Extracting .pdb info with python

2013-06-06 Thread Ed Pozharski

On Thu, 2013-06-06 at 14:41 +1000, Nat Echols wrote:
> You should resist the temptation to write your own PDB parser; that
> way lies pain and suffering.  There are multiple free libraries for
> Python that can be used for this task - I recommend either CCTBX or
> BioPython (probably the latter if you don't need to do very much with
> the models).

Well, that depends on what one is trying to accomplish.  Say, I just
want to count how many atoms I have that are partially occupied in an
average PDB file.  All I need to know is that occupancy is stored in
columns 55:60 as Real(6.2), and that atom record lines begin with "ATOM
"|"HETATM".  With basic python/c/perl/whatever knowledge I can write my
own "occupancy parser" faster than this post.  Getting BioPython or
CCTBX to work will definitely take longer.

By writing primitive parsers one also gains speed and portability, as
well as extends programming skills.  Basically this choice depends on
the complexity of the task and long/short-term goals.

Cheers,

Ed.

-- 
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] Extracting .pdb info with python

2013-06-06 Thread Ed Pozharski

ATOM records have fixed format so you can (and should) use string
slicing instead, like so (one-liner)

serial, aname, altloc, resn, chid, resi, insCode, x, y, z, occ, b,
element, q = line[6:11], line[12:16], line[16], line[17:20], line[21],
line[22:26], line[26], line[30:38], line[38:46], line[46:54],
line[54:60], line[60:66], line[76:78], line[78:80]

or with more explicit typing

serial, aname, altloc, resn, chid, resi, insCode, x, y, z, occ, b,
element, q = int(line[6:11]), line[12:16].strip(), line[16].strip(),
line[17:20].strip(), line[21], int(line[22:26]), line[26],
float(line[30:38]), float(line[38:46]), float(line[46:54]),
float(line[54:60]), float(line[60:66]), line[76:78].strip(), line[78:80]

Cheers,

Ed.



On Thu, 2013-06-06 at 04:37 +, GRANT MILLS wrote:
> Dear CCP4BB,
> 
> I'm trying to write a simple python script to retrieve and manipulate
> PDB data using the following code:
> 
> #for line in open("PDBfile.pdb"):
> #if "ATOM" in line:
> #column=line.split()
> #c4=column[4]
> 
> and then writing to a new document with:
> 
> #with open("selection.pdb", "a") as myfile:
> #myfile.write(c4+"\n")
> 
> Except for if the PDB contains columns which run together such as the
> occupancy and B-factor in the following:
> 
> ATOM608  SG  CYS A  47  12.866 -28.741  -1.611  1.00201.10
> S  
> ATOM609  OXT CYS A  47  14.622 -24.151  -1.842  1.00100.24
> O 
> 
> My script seems to miscount the columns and read the two as one
> column, does anyone know how to avoid this? (PS, I've googled this
> like crazy but I either don't understand or the link is irrelevant)
> 
> Any advice would help.
> Thanks for your time,
> Grant
> 

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] To build ssDNA to base pair with the other strand of DNA in coot

2013-05-29 Thread Ed Pozharski

> I am wondering whether there is a function I could use to build the
> missing bases in one strand of the DNA to base pair with the other
> strand of DNA which is complete...

Calculate->Other modeling tools-> Base pair...

-- 
Coot verendus est

Re: [ccp4bb] use only companies that you know to purchase chemicals

2013-05-29 Thread Ed Pozharski

On Wed, 2013-05-29 at 12:30 -0400, Jeremy Stevenson wrote:
> In this particular case you can see the website was registered in
> September of 2012, which is a good indication that it was set up just
> to scam people.

Just curious - why registration date indicates unsavory nature of
"Jieke"?

-- 
After much deep and profound brain things inside my head, 
I have decided to thank you for bringing peace to our home.
Julian, King of Lemurs

Re: [ccp4bb] Short contact between symmetry equivalents

2013-05-27 Thread Ed Pozharski


On 05/27/2013 12:14 PM, Kavyashree Manjunath wrote:

Later I tried with 0.8, 0.99 for which the map was normal
and also validation did not report it as short contact.
Is it ok if I give 0.99 occupancy?
"Validation" most likely will not report any short contacts if occupancy 
is <1.  If the distance between atoms is still ~1.8A, you have a 
problem.  Perhaps it is not an acetate.


--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] Short contact between symmetry equivalents

2013-05-27 Thread Ed Pozharski


On 05/27/2013 11:27 AM, ka...@ssl.serc.iisc.in wrote:

Sir,

Ok. It is an acetate ion which interacts with its symmetry
equivalent ion only one of its oxygen atoms is closer to
its symmetry equivalent and not the entire ion. So do I
need to give lower occupancy for this ion?

Thank you
Regards
Kavya




It does not interact - you cannot have 1.8A distance between atoms. 
Assuming that it is indeed acetate it must be partially occupied, 0.5 or 
less.  Keep in mind that when you lower occupancy you may see additional 
density for whatever occupies the space on the other side of the 
symmetry element (e.g. water) which you may need to model.




--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] Short contact between symmetry equivalents

2013-05-27 Thread Ed Pozharski

It is probably a wrong question to ask here.  Pretty much everything is 
"tolerated" by PDB during deposition, the report you get is an advice, 
not instruction.  I wonder whether anyone has an example of the 
RCSB/PDBe/PDBj ever turning down submitted structure.


The right question is whether the short contact you mention is tolerated 
by laws of nature.  It's fairly common to have, say, a water molecule 
split in two positions near symmetry axis - as long as you have it at 
occupancy<1.0, it's ok.


On 05/27/2013 05:21 AM, Kavyashree Manjunath wrote:

Dear users,

  Is short contact (1.83Ang) between an atom and symmetry
equivalent of itself tolerated during deposition? I am not
able to get rid of this short contact appearing after refinement.

Thank you
Regards
Kavya





--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] cryo condition

2013-05-23 Thread Ed Pozharski

Matt,

with this technique, how do you prevent crystal from drying up (other
than "doing it fast")?  I know Thorne's group does this trick under oil.
If you take no extra precautions, do you have an estimate of how often
diffraction is destroyed by this?  

On the other hand, it's quite possible that what destroys resolution
when crystals dry up is increase in concentration of non-volatile mother
liquor components, which shouldn't be happening here to the same degree.

Cheers,

Ed.

On Thu, 2013-05-23 at 14:38 +0200, Matthew BOWLER wrote:
> Hi Faisal,
> if your solvent channels are smaller than 40A in the largest dimension 
> (most are) you can use a mesh loop to pick up the crystal and then wick 
> away all of the mother liquor. You can then flash cool your crystal 
> without having to transfer the crystal to another solution. Good luck, 
> Matt

-- 
After much deep and profound brain things inside my head, 
I have decided to thank you for bringing peace to our home.
Julian, King of Lemurs

Re: [ccp4bb] DNA version converter

2013-05-22 Thread Ed Pozharski

Dear Tim,

I am sorry that you feel offended.  It is rather unfortunate that you
got an impression from my secondary comment that I am asking for help
with pdbvconv when I wasn't.  It is also rather unfortunate that I have
figured out what the problem was myself and therefore did not have an
opportunity to ask for and fully utilize your help and that of other
ccp4bb members.  In general, it is rather unfortunate that helping
others often feels and occasionally is a waste of time.  Hopefully this
unfortunate experience will not dissuade you from positively
contributing to the ccp4bb to the same extent to which it dissuades me
from ever again soliciting any advice through this venue.

Cheers,

Ed.

PS.  Naturally, at the time of the original post and the subsequent one
you responded to I did not yet know what was wrong with the specific pdb
file.  To have such knowledge and yet ask why pdbvconv fails (which I
did not ask) would be both stupid and evil.  Then again, I may be both
of these things.

On Wed, 2013-05-22 at 21:55 +0200, Tim Gruene wrote:
> Dear Ed,
> 
> I do feel offended because I follow the ccp4bb with the intention of
> helping people with my answers, which does take time. If it turns out
> the I wasted my time because of the lack of information I consider the
> question asked not follow what I consider netiquette. I doubt that one
> line from a PDB file or the point that you have used pbdvconv before
> (especially in case you were aware of the problem being due to a
> simple shift - of course I do not know whether you were aware of it
> when opening the thread) and it failed would have revealed any
> information that would give your competitors any advantage, but it
> would have narrowed down the problem and probably the number of people
> spending time trying to find and give a helpful answer.
> 
> Regards,
> Tim
> 
> On 05/22/2013 07:55 PM, Ed Pozharski wrote:
> > On Wed, 2013-05-22 at 18:09 +0200, Tim Gruene wrote:
> >> the answers you received were correct with respect to the 
> >> question you asked. If they are not satisfactory, you have not 
> >> given sufficient information.
> >> 
> > Tim,
> > 
> > Not sure when I expressed any dissatisfaction with replies I 
> > received. I asked whether someone had a simple tool to turn v3 DNA 
> > records back to v2.  I got an excellent off-list suggestion to use 
> > Remediator tool from kinemage/molprobity (writen by Jeff Headd and 
> > Robert Immormino).
> > 
> > It was probably easy to guess that I tried pdbvconv and ran into 
> > problems (in fact, buster takes care of conversion internally 
> > unless it fails).  I appreciate your initial comment that pdbvconv 
> > works for you and was simply pointing out that my observations are 
> > not of general nature and obviously specific to a particular file
> > I was trying to convert.  It appears that you were offended by
> > that and if so, it was not my intent.  In my defense, I only asked
> > for available conversion tools and did not ask specifically for
> > help with pdbvconv. With this in mind, I hope I can be forgiven for
> > not posting unpublished structural model on a bulletin board in
> > order to provide sufficient information for answering a question I
> > did not ask.
> > 
> > If you are interested in specifics, the problem was that some other
> > program made residue names left-justified.
> > 
> > Cheers,
> > 
> > Ed.
> > 
> > 
> 

-- 
Bullseye!  Excellent shot, Maurice.
  Julian, King of Lemurs.

Re: [ccp4bb] DNA version converter

2013-05-22 Thread Ed Pozharski

On Wed, 2013-05-22 at 18:09 +0200, Tim Gruene wrote:
> the answers you received were correct with respect to the question you
> asked. If they are not satisfactory, you have not given sufficient
> information.
> 
Tim,

Not sure when I expressed any dissatisfaction with replies I received.
I asked whether someone had a simple tool to turn v3 DNA records back to
v2.  I got an excellent off-list suggestion to use Remediator tool from
kinemage/molprobity (writen by Jeff Headd and Robert Immormino).  

It was probably easy to guess that I tried pdbvconv and ran into
problems (in fact, buster takes care of conversion internally unless it
fails).  I appreciate your initial comment that pdbvconv works for you
and was simply pointing out that my observations are not of general
nature and obviously specific to a particular file I was trying to
convert.  It appears that you were offended by that and if so, it was
not my intent.  In my defense, I only asked for available conversion
tools and did not ask specifically for help with pdbvconv. With this in
mind, I hope I can be forgiven for not posting unpublished structural
model on a bulletin board in order to provide sufficient information for
answering a question I did not ask.

If you are interested in specifics, the problem was that some other
program made residue names left-justified.

Cheers,

Ed.

-- 
"Hurry up before we all come back to our senses!"
   Julian, King of Lemurs

Re: [ccp4bb] DNA version converter

2013-05-22 Thread Ed Pozharski

Tim,

naturally, the issue is specific to a particular pdb file (it would be
indeed strange if the conversion tool from a major software package
would fail in general sense).

Ed.

On Wed, 2013-05-22 at 08:12 +0200, Tim Gruene wrote:
> Hi Ed,
> 
> 
> #> pdbvconv -p udo.pdb -o udov.pdb
> #> pdbvconv -p udov.pdb -o udovv.pdb
> #> diff -q udo.pdb udov.pdb
> Files udo.pdb and udov.pdb differ
> #> diff -q udov.pdb udovv.pdb
> Files udov.pdb and udovv.pdb differ
> #> diff -q udo.pdb udovv.pdb
> #>
> 
> it works for me, Version: 1.0 (11 June 2008) from buster 2.10. Which
> version of pdbvconv are you using?
> Best,
> Tim
> 
> 
> On 05/21/2013 10:40 PM, Ed Pozharski wrote:
> > On 05/21/2013 04:35 PM, Francis Reyes wrote:
> >> Since you're using buster, have you tried global phasing's own 
> >> pdbvconv tool?
> > Naturally, but it leaves file unchanged.
> > 
> 

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] DNA version converter

2013-05-21 Thread Ed Pozharski


On 05/21/2013 04:35 PM, Francis Reyes wrote:

Since you're using buster, have you tried global phasing's own pdbvconv tool?

Naturally, but it leaves file unchanged.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

[ccp4bb] DNA version converter

2013-05-21 Thread Ed Pozharski

Does anyone have a script to convert pdb file with DNA atom records from 
v3 back to v2?  I can certainly right my own and asking only if you 
already have it written.  Strictly speaking, this is not ccp4-related - 
apparently, buster expects the old format.


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] question about CCP4 scripts

2013-05-09 Thread Ed Pozharski

At this point you do have the scalepack2mtz output file (BTW,
imosflm/aimless is wholeheartedly recommended by this convert), and you
can easily extract all the info from there.  There is mtzdmp, of course,
but it's easier to parse the actual mtz file (hey, the records are
actually text).  Like so:

egrep -oa "SYMINF.{74}" foo.mtz | awk '{print "symm "$5}'

Gives you the space group *number*, most ccp4 programs I know accept
that in addition to string symbol.  But if you really need the latter, 

echo symm $(egrep -oa "SYMINF.{74}" foo.mtz | cut -d"'" -f 2 | sed
s/""//g) 

will do.

As for unit cell parameters, this should work

egrep -oa "CELL.{76}" foo.mtz | sed -n 1p

Keep in mind that this extracts the "global cell" and will be
problematic if you have multi-dataset file (which I presume you don't).
If you need Nth dataset grab DCELL record, e.g. for the dataset #1

egrep -oa "DCELL.{75}" foo.mtz | sed -n 2p

Cheers and 

https://xkcd.com/208/


Ed

On Wed, 2013-05-08 at 23:37 -0400, Joe Chen wrote:

> Hi All,
> 
> 
> 
> 
> 
> I am trying to write a shell script to streamline a few steps, one of
> which is Unique, see below.  As you can see, this program requires
> symmetry and cell parameters.  In CCP4 GUI Scalepack2mtz, these info
> are automatically extracted from .sca file (first two lines).  But I
> don't know if there is a way to do this in script, so I don't need to
> type these values for each dataset.  Thank you in advance for your
> help.
> 
> 
> #!/bin/sh
> # unique.exam
> # 
> # runnnable test script for the program "unique" - this will use this
> # program to generate a reflection list containing null values.
> # 
> 
> set -e
> 
> unique hklout ${CCP4_SCR}/unique_out.mtz << eof
> labout F=F SIGF=SIGF
> symmetry p43212
> resolution 1.6
> cell 78.1 78.1 55.2 90.0 90.0 90.0
> eof
> 
> 
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Joe

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] Program or server to predict Kd from complex structure

2013-04-18 Thread Ed. Pozharski

Don't believe such program/server does exist.   Notice that you are asking for 
something that *can* predict Kd.  One can *try* making such predictions and 
they may even be routinely in the ballpark, assuming that you are satisfied 
with being routinely off by, say, an order of magnitude. 

One can easily predict general trends.  For example, larger buried apolar 
surface will generally result in lower Kd. As for individual Kd prediction 
accuracy,  that's another story. 

It's unknown to me what your goal is, but if you are trying to replace 
experimental Kd determination with a magic program, please don't.

Cheers, 

Ed.

 Original message 
From: Wei Liu  
Date: 04/18/2013  4:39 AM  (GMT-05:00) 
To: CCP4BB@JISCMAIL.AC.UK 
Subject: [ccp4bb] Program or server to predict Kd from complex structure 
 
Dear all,

Does anyone know a program or web server that can predict Kd value between two 
proteins from a solved complex structure?

Regards
Wei

Re: [ccp4bb] salt or not?

2013-04-15 Thread Ed. Pozharski

Protein-DNA complex crystal with channels too small for the dye is *extremely* 
unlikely, imho.

 Original message 
From: Ulrike Demmer  
Date: 04/15/2013  8:48 AM  (GMT-05:00) 
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] salt or not? 

Dear Careina,

altough your crystals does't take up the Izit dye it sounds promising. The 
uptake of Izit depends on the solvent channels of the protein molecule - 
sometimes the dye just can't enter the molecule.
Concerning the Calciumchloride - if the concentration is not too high and 
without other ingredients which could cause less solube salt there is the 
possiblity that you have got protein crystals. I once had a condition whith 34 
% MPD + 0.1M buffer + 0.1 M Calciumchloride which produced nicely diffracting  
crystals
To speak from my experience I think 1 month after setting up the trays the 
drops should not be dried out. If the wells are sealed properly new crytals can 
appear even after 1 year.

You should definately check the diffraction then you will know for sure.

Cheers,

Ulrike

Re: [ccp4bb] CCP4 Update victim of own success

2013-04-12 Thread Ed Pozharski


On 04/12/2013 06:03 PM, Nat Echols wrote:
On Fri, Apr 12, 2013 at 2:45 PM, Boaz Shaanan 
mailto:bshaa...@exchange.bgu.ac.il>> wrote:


Whichever way the input file for the run is prepared (via GUI or
command line), anybody who doesn't inspect the log file at the end
of the run is doomed and bound to commit senseless errors. I was
taught a long time ago that computers always do what you told them
to do and not what you think you told them, which is why
inspecting the log file helps.


I agree in principle - I would not advocate that anyone (*especially* 
novices) run crystallography software as a "black box".  However, 
whether or not a program constitutes a black box has nothing to do 
whether it runs in a GUI or not.  The one advantage a GUI has is the 
ability to convey inherently graphical information (plots, etc.).  
That it is still necessary to inspect the log file(s) carefully 
reflects the design of the underlying programs; ideally any and all 
essential feedback should also be displayed in the GUI (if one 
exists).  Obviously there is still much work to be done here.


-Nat


It is hard to blame "novices" for running crystallography software as a 
black box when the websites from which they download the said software 
use the word "automated" to describe it.  Because, at least according to 
wikipedia (another great resource that should be used with care), 
"automation is the operation of machinery without human supervision".  
Checking the log-files or messages supplied by GUI seems to fall under 
"human supervision", which "automated" programmes should not really 
require.  I am not advocating return to the stone age when naming a 
tutorial for a widely used model building software "... for morons" was 
probably considered a joke (not a good one too).  I am just saying that 
it is perhaps quite predictable that with promise of automation comes 
the expectation of, well, automation.  Whether the true automation of 
crystallographic structure determination may become available in the 
future is perhaps debatable.  Whether it is already available probably 
isn't.


On a broader question of GUI versus command line, both obviously have 
their uses.  Mastering command line gives one flexibility and perhaps 
greater insight into what programmes actually do.  Do I prefer a little 
button that opens a file chooser dialog over sam-atom-in?  Absolutely.  
But I am glad that --pdb and --auto command line options are supplied, 
because I can then write a little bash pipeline to pass 50 expected 
protein-ligand complex datasets through simple refmac-coot cycle to 
quickly see which ones are interesting.  In that regard, both ccp4 and 
phenix are doing it the right way - gui is simply a gateway to the 
command-line controlled code. I can then choose the interface that fits 
particular situation.


As for the relatively new CCP4 update feature, it is absolutely awesome.

Cheers,

Ed.



--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] DNA structures superimpose

2013-04-12 Thread Ed. Pozharski

Doesn't lsqkab work? It just needs proper atom matching. 

Coot definitely does superimpose DNA.  One problem is that simple-lsq works on 
a single chain, but you can trick it by changing chain id and renumbering the 
other strand.

As for helix rotation, you can derive it from rotation reported by lsq.  But it 
is more common to look at DNA geometry directly for changes (3DNA or Curves+).

 Original message 
From: "Veerendra Kumar (Dr)"  
Date: 04/12/2013  2:18 AM  (GMT-05:00) 
To: CCP4BB@JISCMAIL.AC.UK 
Subject: [ccp4bb] DNA structures superimpose 
 
Dear CCP4 members,
Is there any program to superimpose the DNA structures? I also want to measure 
the relative domain rotation angle. I tried using DynDom but it does not work 
for me. 
Can someone suggest a program which can output the rotation angles? 

Thank you 

Best Regards

Veerendra kumar

CONFIDENTIALITY:This email is intended solely for the person(s) named and may 
be confidential and/or privileged.If you are not the intended recipient,please 
delete it,notify us and do not copy,use,or disclose its content.

Towards A Sustainable Earth:Print Only When Necessary.Thank you.

Re: [ccp4bb] Building ideal B DNA model in Coot

2013-04-06 Thread Ed Pozharski


On 04/04/2013 04:51 PM, 李翔 wrote:

Hi everyone,

I met a problem when trying to build ideal DNA model in Coot. The 
calculated DNA looks less than 10.5 bp/ turn, probably is about 10 bp/ 
turn. Is there a way for me to change the pitch to make it 10.5 
bp/turn in Coot?


Thanks for your kind help!

Sincerely,
Frank
I do not know the answer to your question (but suspect the answer is 
no).  If you want a better control over DNA conformation, consider 3DNA 
(http://x3dna.org) - it can build the DNA molecule from set of 
parameters you provide.




--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] COOT usage for structure comparison

2013-04-04 Thread Ed Pozharski


On 04/04/2013 04:29 AM, Tim Gruene wrote:

Dear --,
Are we, the "ccp4bb community", recently on the hunt to find new and 
exciting ways to make sure people stop asking questions?  Gentleman from 
Moscow has clearly disclosed his full name and affiliation, but perhaps 
I am wrong and subtle criticism of his signature-formatting skills is 
highly relevant.  Feel free to call me Julian from now on :)


Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] delete subject

2013-03-28 Thread Ed Pozharski

On Thu, 2013-03-28 at 12:15 +, Tom Van den Bergh wrote:
> I think this is a good time to end the discussion.

As a general comment, discussions on boards like ccp4bb often digress
and take direction different from you original intent.  I may understand
your desire to try to control the situation, but if people on this board
feel that the questions of data sharing, student training, netiquette
and proper choice of resolution cutoff are worthy of further discussion
(that may not have much to do with specifics of your original request
for assistance), it is their right too.

What may have caused some extra grief is this unfortunate turn of phrase
in your original post

"Could you try some refinement for me, because this is first structure
that i need to solve as a student and i dont have too many experience
with it."

It goes a bit beyond the usual "my R-values are too high what should I
do" question and may be instinctively construed as if you expect someone
to actually do your work for you (I am sure that is not what you asked).
So a bit of a vigorous reaction that you received likely results from
misunderstanding your intent (albeit posting your data is very unusual
and strengthens the impression) and perhaps misplaced feeling that you
have abandoned attempts to resolve the problem independently too soon.
I did *not* look at your data and therefore I may be completely wrong
here, but it is my understanding that your actual issue was not
realizing there could be more than one molecule in the asymmetric unit.

More traditional route is to describe your situation in general terms
and offer to provide data to those willing to take a closer look.

Cheers,

Ed.


-- 
"Hurry up before we all come back to our senses!"
   Julian, King of Lemurs

Re: [ccp4bb] Isothermal titration calorimetry

2013-03-26 Thread Ed. Pozharski

This might have changed, but in the past file formats were different.   
Microcal files are text, while TA's are binary.  I do have the actual 
description of TA's format if anyone is interested, but it must be easier to 
use native text export than write a converter. 

 Original message 
From: "Bosch, Juergen"  
Date:  
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] Isothermal titration calorimetry 

Hi George,
this is probably a very stupid suggestion and you likely have tried it, but 
I'll suggest the obvious nevertheless.
What happens to your .nitc file when you rename it to .itc can you read it in 
Origin then ?

Jürgen

On Mar 24, 2013, at 6:39 AM, George Kontopidis wrote:

Chris,  indeed nanoITC  instrument analysis software is very robust and user
friendly (probable more friendly than microcal, GE). 

Although when you need  to subtract  Q (heat)  values (from 2 or 3 blank
experiments) from your experimental data you cannot. NanoITC software  can
subtract  Q values only  from 1 blank experiment.
Also if you want to present  your data in a form of  heat/mol in Y
(vertical) axes  again you cannot. It presents  data in Y axes only in form
of heat/injection. 
If you have found a way to extract 2 or 3 blank experiments from
experimental data or present data in form of heat/mol, please let me know it
will be very useful.

The main problem in the output files from nanoITC come with an extension
.nitc, by default.  Unfortunately Origin (that can do all the above) can
read only,  filenames with an extension .itc

Cheers,

George

-Original Message-
From: Colbert, Christopher [mailto:christopher.colb...@ndsu.edu] 
Sent: Saturday, March 23, 2013 5:56 PM
To: George Kontopidis; CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Isothermal titration calorimetry

George, would you please explain your comments?  We've found the TA
Instruments analysis software very robust and user friendly.

We have the low volume nanoITC from TA instruments and get equivalent #'s in
our comparison tests to the Microcal instrument.

Cheers,

Chris

--
Christopher L. Colbert, Ph.D.
Assistant Professor
Department of Chemistry and Biochemistry North Dakota State University P.O.
Box 6050 Dept. 2710 Fargo, ND 58108-6050
PH: (701) 231-7946
FAX: (701) 231-8324

On 3/23/13 8:47 AM, "George Kontopidis"  wrote:

Keep in mind that output files from  nanoITC, TA instrument cannot be 
red by Origin. At some point you will need to analyse your data 
further.

George

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
Anastassis Perrakis
Sent: Saturday, March 23, 2013 12:46 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Isothermal titration calorimetry

It might be worth to consider the question more in detail.

Do you want to study thermodynamics of the interaction, or a KD would do?
If
the former, you need ITC. If the latter, and you want to study things
at the level of KD only, maybe investing on a plate reader, 
thermophoresis, or some biosensor technology (spr or interferometry 
based systems) should be considered.

Then, what interactions will you study with the ITC? In general, I 
would agree that the lower sample volume is worth the nano options, but 
depending on the typical systems under study, sometimes the gain on 
sample quantity is not worth the money - while many times its worth it.

John is if course right that for studying specific systems as the one 
he describes the 200 is great.

A. 

Sent from my iPhone

On 23 Mar 2013, at 11:00, John Fisher  wrote:

I would recommend the Microcal ITC 200, hands down. Not only is it an
amazing instrument with the optional automated sample loader (which is 
worth every penny), but we were able to do experiments (multiple) using 
FULL-LENGTH p53 binding to a weak cognate protein. I believe this was 
the first time ITC was ever used with full length p53, as it is so 
labile and just loves immediately to oligomerize. Sample sizes pay for 
the instrument.
Best,
John

John Fisher, M.D./PhD
St. Jude Children's Research Hospital Department of Oncology
Department of Structural Biology
W: 901-595-6193
C: 901-409-5699

On Mar 23, 2013, at 4:45 AM, Sameh Soror 
wrote:

Dear All,

I am sorry for the off topic question. I am going to buy ITC to
study
protein-protein & protein-ligand interactions

I am comparing microcal, GE and nanoITC, TA instrument..
any suggestions, recommendations, good experiences or bad experiences.
is
there a better system.

Thank in advance for the help.

Regards

Sameh

--
Sameh Soror

Postdoc. fellow

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry & Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:      +1-410-614-4894
Fax:      +1-410-955-2926
http://lupo.jhsph.edu

Re: [ccp4bb] Rfree reflections

2013-03-26 Thread Ed. Pozharski

As I recall, number of reflections set aside for cross-validation also affects 
stability of sigmaA estimates.  With 500 reflections and 20 resolution shells 
you are down to 25 reflections per shell, which may be a bit too low.

 Original message 
From: Robbie Joosten  
Date:  
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] Rfree reflections 
 
Hi Tim,

The derivation of sigma(Rw-free) is in this paper: Acta Cryst. (2000). D56,
442-450. Tickle et al.
Note the difference between the sigma of weighted/generalized/Hamilton
R-free and that of the 'regular' R-free (there is a 2 there somewhere). From
my own tests (10 fold cross-validation on 38 small datasets) I also find
sigma(R-free) = R-free/sqrt(Ntest).

For large datasets you really do not need to do k-fold cross validation,
because sigma(R-free) can be predicted quite well. We just need to realize
that it exists,

Cheers,
Robbie

> -Original Message-
> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
> Tim Gruene
> Sent: Tuesday, March 26, 2013 11:05
> To: CCP4BB@JISCMAIL.AC.UK
> Subject: Re: [ccp4bb] Rfree reflections
> 
> Hi Robbie,
> 
> thank you for the explanation. Heinz Gut and Michael Hadders pointed me at
> Axel Brunger's publication Methods Enzymol. 1997;277:366-96.,
> http://www.ncbi.nlm.nih.gov/pubmed/18488318, which is where I got the
> notion of
> 500-1000 from. In this article a decrease of the error margin of Rfree
with
> n^(1/2) is mentioned (p.384), but only as an observation. Is your
statement
> "inverse proportional with the number of reflections" based on some
> statistical treatment, or also just on observation?
> 
> It is a pity that k-cross validation is not standard routine because it
seems so
> easy and so quickly to do with nowadays computers and a simple script. But
> that's probably like reminding people of not using R_int anymore in favour
of
> R_meas...
> 
> Cheers,
> Tim
> 
> On Tue, Mar 26, 2013 at 10:24:51AM +0100, Robbie Joosten wrote:
> > Hi Tim,
> >
> > I don't think the 5-10% or 500-1000 reflections are real rules, but
> > rather practical choices. The error margin in R-free is inverse
> > proportional with the number of reflections in your test set and also
> > proportional with R-free itself. So for R-free to be 'significant' you
> > need some absolute number of reflections to reach your cut-off of
> > significance. This is where the 1000 comes from (500 is really pushing
the
> limit).
> > You want to make sure the error margin in R and R-free are not too far
> > apart and you probably also want to keep the test set representative
> > of the whole data set (this is particularly important because we use
> > hold-out validation, you only get one shot at validating). This is where
the
> 5%-10% comes from.
> > Another consideration for going for the 5%-10% thing is that this
> > makes it feasible to do 'full' (i.e. k-fold) cross-validation: you
> > only have to do
> > 20-10 refinements.  If you would go for 1000 reflections you would
> > have to do 48 refinements for the average dataset.
> >
> > Personally, I take 5% and increase this percentage to maximum 10% if
> > using 5% gives me a test set smaller than 1000 reflections.
> >
> > HTH,
> > Robbie
> >
> > > -Original Message-
> > > From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf
> > > Of Tim Gruene
> > > Sent: Tuesday, March 26, 2013 09:33
> > > To: CCP4BB@JISCMAIL.AC.UK
> > > Subject: [ccp4bb] Rfree reflections
> > >
> > > Dear all,
> > >
> > > I recall that the set of Rfree reflections should be 500-1000,
> > > rather than
> > 5-
> > > 10%, but I cannot find the reference for it (maybe Ian Tickle?).
> > >
> > > I would therefore like to be confirmed or corrected:
> > >
> > > Is there an absolute number required for Rfree to be significant, i.e.
> > 500-1000
> > > irrespective of the total number of unique reflections in the data
> > > set, or
> > is it
> > > 5-10% (as a compromise)?
> > >
> > > Thanks and regards,
> > > Tim
> > >
> > > --
> > > --
> > > Dr Tim Gruene
> > > Institut fuer anorganische Chemie
> > > Tammannstr. 4
> > > D-37077 Goettingen
> > >
> > > GPG Key ID = A46BEE1A
> >
> 
> --
> --
> Dr Tim Gruene
> Institut fuer anorganische Chemie
> Tammannstr. 4
> D-37077 Goettingen
> 
> GPG Key ID = A46BEE1A

Re: [ccp4bb] How to convert file format from CNS to CCP4

2013-03-23 Thread Ed Pozharski


On 03/23/2013 09:59 AM, Wei Feng wrote:
Can you help me to check out why these maps can not be converted by 
sftools?
sftools is not for manipulating map files.  Mapman from uppsala software 
factory would be a good choice.  xdlmapman, a gui frontend to it, used 
to be part of ccp4.


--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

Re: [ccp4bb] Philosophical question

2013-03-19 Thread Ed Pozharski


Jacob,
So you'd have to explain why the codon convention is so 
intolerant/invariant relative to the other features--it seems to me 
that either it is at an optimum or there is some big barrier holding 
it in place.


Because altering codon convention will result in massive translation errors.

However the original question refers to start codon and its relation to 
methionine.  Notice that AUG is the *only* codon for methionine. If you 
change amino acid specificity of the methionine tRNA synthetase, you'd 
replace every methionine in every protein.  It is very unlikely that an 
organism other than one with a very small genome can survive that.  
Given high fidelity required of tRNA synthetases, changing their 
specificity is also not easy.  Most mutations are likely to incapacitate 
the enzyme rather than switch its specificity, resulting in organism 
that is unable to develop (due to stalled translation), let alone survive.


As for the optimization part - I am also not sure what significant 
benefit you expect from replacing starting methionine with a different 
amino acid.  It is mostly removed anyway.  Why that is? My (uneducated) 
guess is that it is rarely structural and there is benefit in recycling it.


Cheers,

Ed.

Re: [ccp4bb] Philosophical question

2013-03-19 Thread Ed Pozharski


On 03/19/2013 02:41 PM, Jacob Keller wrote:
I don't understand this argument, as it would apply equally to all 
features of the theoretical LUCA 
No it won't.  Different features would have different tolerance levels 
to modifications.


Philosophically, one is wrong to expect that living organisms will 
evolve in a fashion that we find optimal.  Whenever I feel that a 
protein behaves in a way I find stupid, I simply say "giraffe laryngeal 
nerve" and all comes back to normal.

Re: [ccp4bb] Strange density in solvent channel and high Rfree

2013-03-15 Thread Ed Pozharski

Check for translational NCS

And you are way too conservative with resolution.  Even those holding
onto the Rmerge-dictated past would probably acquiesce to lower I/sig
cutoff.  If you are using aimless, follow its recommendations based on
CC1/2, it's good for you.

Cheers,

Ed.

On Fri, 2013-03-15 at 15:39 -0300, Andrey Nascimento wrote:
> Dear all,
> 
> I have collected a good quality dataset of a protein with 64% of
> solvent in P 2 21 21 space group at 1.7A resolution with good
> statistical parameters (values for last shell: Rmerge=0.202;
> I/Isig.=4.4; Complet.=93% Redun.=2.4, the overall values are better
> than last shell). The structure solution with molecular replacement
> goes well, the map quality at the protein chain is very good, but in
> the final of refinement, after addition of a lot of waters and other
> solvent molecules, TLS refinement, etc. ... the Rfree is a quite high
> yet, considering this resolution (1.77A).(Rfree= 0.29966 and Rfactor=
> 0.25534). Moreover, I reprocess the data in a lower symmetry space
> group (P21), but I got the same problem, and I tried all possible
> space groups for P222, but with other screw axis I can not even solve
> the structure.
> 
> A strange thing in the structure are the large solvent channels with a
> lot of electron density positive peaks!? I usually did not see too
> many peaks in the solvent channel like this. This peaks are the only
> reason for these high R's in refinement that I can find. But, why are
> there too many peaks in the solvent channel???
> 
> I put a .pdf file (ccp4bb_maps.pdf) with some more information and map
> figures in this
> link: https://dl.dropbox.com/u/16221126/ccp4bb_maps.pdf
> 
> 
> Do someone have an explanation or solution for this?
> 
>  
> 
> Cheers,
> 
> Andrey
> 

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] statistical or systematic? bias or noise?

Ian,

On Wed, 2013-03-13 at 19:46 +, Ian Tickle wrote:
> So I don't see there's a question of wilfully choosing to ignore. or
> not sampling certain factors: if the experiment is properly calibrated
> to get the SD estimate you can't ignore it.
> 

So perhaps I can explain better by using the same example of protein
concentration measurement.  It is certainly true that only taking one
dilution is "poor design". (Although in crystallization practice it may
not matter given that it is not imperative to have a protein exactly at
10 mg/ml, 9.7 will do).  If I don't bother including pipetting precision
in my error estimate either by direct experiment or by using
manufacturer's declaration I am willfully ignoring this source of error.
That would be wrong.

But what if I only have one measurement worth of sample?  And pipetting
precision cannot be calibrated (I know it can be so this is hypothetical
- say pipettor was stolen and company that made it is out of business,
their offices burned down by raging mob).  Is the pipetting error now
systematic because experimental situation (not design) prevents it from
being sampled or estimated?

I actually like the immutable error type better for my own purposes, but
I am trying to see whether some argument might stand that allows some
error that can be sampled to be called inaccuracy nonetheless.  

Cheers and thanks,

Ed.

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs

Re: [ccp4bb] statistical or systematic? bias or noise?

> OK.  Other words, what is potentially removable error is always
> statistical error, whether it is sampled or not.

Clarification - what I meant is potentially removable by proper sampling
and reducing standard error to zero with infinite number of
measurements.  Not removable by better calibration or experimental
setup.

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs

Re: [ccp4bb] statistical or systematic? bias or noise?

Ian,

thanks - I think I had it backwards after reading your first post and
thought of controllable errors being those that can be brought "under
conrtol" by sampling, whereas uncontrollable would be those that cannot
be sampled and therefore their amplitude is unknown.

Yet you also seem to agree that characterization is dependent on
specifics of experimental setup, leaving the door open for the
possibility that noise-vs-bias choice may be driven by experimental
circumstance.  

And in practice, wouldn't it be more consistent to stick with the
definition that statistical error/noise/precision is defined by what is
really sampled?  Because if some factor is not sampled, I have zero
knowledge of the corresponding error magnitude.  I agree with Tim that
not sampling what can be easily sampled is a poorly designed experiment,
but it can also be characterized (which is probably a nicer term) as an
experiment with large systematic error (due to poor design).

Cheers,

Ed.

On Wed, 2013-03-13 at 12:33 +, Ian Tickle wrote:
> Ed, sorry for delay.  I was not trying to make any significant
> distinction between "controllable" and "potentially controllable":
> from a statistical POV they are the same thing.  The distinction is
> purely one of practicality, i.e. within the current experimental
> parameters is it possible to eliminate the systematic error, for
> example is there a calibration step where you determine the systematic
> error by use of a standard of known concentration.  The error is still
> controllable regardless of whether you actually take the trouble to
> control it!  Note that the experimental setup has not changed, you are
> merely using the same apparatus in a different way but any random
> errors associated with the measurements will still be present.
> 
> 
> Of course if you change the experimental setup (note that this
> potentially includes the experimenter!) then all bets are off!  It's
> very important to describe the experimental setup precisely before you
> attempt to characterise the errors associated with a particular setup.
> 
> 
> BTW I agree completely with Kay's analysis of the problem: as he said
> "you are sampling (once!) a statistical error component".  This is
> what I was trying to say, he just said it in a much more concise way!
> This random (uncontrollable) error then gets propagated through the
> sequence of steps in the experiment along with all the other
> uncontrollable errors.
> 
> Cheers
> 
> 
> -- Ian
> 
> 
> 
> On 11 March 2013 19:04, Ed Pozharski  wrote:
> Ian,
> 
> thanks for the quick suggestion.
> 
> On Mon, 2013-03-11 at 18:34 +, Ian Tickle wrote:
> > Personally I tend to avoid the systematic vs random error
> distinction
> > and think instead in terms of controllable and
> uncontrollable errors:
> > systematic errors are potentially under your control (given
> a
> > particular experimental setup), whereas random errors
> aren't.
> >
> 
> Should you make a distinction then between controllable
> (cycling cuvette
> in and out of the holder) and potentially controllable errors
> (dilution)?  And the latter may then become controllable with
> a
> different experimental setup?
> 
> Cheers,
> 
> Ed.
> 
> --
> I don't know why the sacrifice thing didn't work.
> Science behind it seemed so solid.
> Julian, King of Lemurs
> 
> 
> 

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] statistical or systematic? bias or noise?

Kay,

>  the latter is _not_ a systematic error; rather, you are sampling (once!) a 
> statistical error component. 

OK.  Other words, what is potentially removable error is always
statistical error, whether it is sampled or not.

So is it fair to say that if there are some factors that I either do not
know about, willfully choose to ignore or just cannot sample, then I am
underestimating precision of the experiment?  

Cheers,

Ed.


-- 
After much deep and profound brain things inside my head, 
I have decided to thank you for bringing peace to our home.
Julian, King of Lemurs

Re: [ccp4bb] [ccp4b] statistical or systematic? bias or noise?

Adam,

OK, seems like you are going with "it's always statistical error we just
don't yet know what it is" option.

Ed.

On Tue, 2013-03-12 at 16:15 +, Adam Ralph wrote:
> Hi Ed,
> 
> 
>  You can have both types of error in a single experiment, however
> you cannot determine 
> statistical (precision or as Ian says uncontrollable) error with one
> experiment. The manufacturer
> will usually give some specs on the pipette, 6ul +/- 1ul. In order to
> verify the specs
> you would need to perform many pipetting experiments. But even if the
> manufacturer does not give 
> any specs you still know that the pipette is not perfect and there
> will be a statistical error, you
> just do not know what it is.
> 
> 
> In theory, accuracy or bias could be determined with one
> experiment. Lets say you thought
> you had a 6ul pipette but actually it was a 12ul pipette. If you then
> compare the 'new' pipette
> against a standard you could tell if it was inaccurate. Of course
> normally you would repeat 
> this experiment as well because of statistical error. If detected bias
> can be removed. Systematic 
> error may not be so easily detected. What if the standard is also
> biased.
> 
> 
> Adam
> 
> 
> 
> 
> > One can say it's inaccuracy when it is not estimated and imprecision
> > when it is.  Or one can accept Ian's suggestion and notice that
> there is
> > no fundamental difference between things you can control and things
> you
> > can potentially control.

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the Way is forgotten duty and justice appear;
Then knowledge and wisdom are born along with hypocrisy.
When harmonious relationships dissolve then respect and devotion arise;
When a nation falls to chaos then loyalty and patriotism are born.
--   / Lao Tse /

Re: [ccp4bb] statistical or systematic? bias or noise?

Pete,

> Actually, I was trying to say the opposite - that the decision to 
> include something in the model (or not) could change the nature of the 
> error.  

Duly noted

> Pete
> 
> PS - IIUC := ?
> 

IIUC - If I Understand Correctly

-- 
Bullseye!  Excellent shot, Maurice.
  Julian, King of Lemurs.

Re: [ccp4bb] [Err] Re: [ccp4bb] statistical or systematic? bias or noise?

By the way, am I the only one who gets this thing with every post?  If
anyone can ask Jin Kwang (liebe...@korea.ac.kr) to either clean up his
mailbox or unsubscribe, that would be truly appreciated.  Delete button
is easy and fun to use, but this has been going on for quite some time.

On Tue, 2013-03-12 at 04:16 +0900, spam_mas...@korea.ac.kr wrote:
> ransmit Report:
> 
> liebe...@korea.ac.kr ߼; 5 õ Ͽ4ϴ.
> ( / : 554 Transaction failed. 402 Local User Inbox Full
> (liebe...@korea.ac.kr) 4,61440,370609(163.152.6.98))
> 
> <> / 
> User unknown   :; ڰ x =
> Socket connect fail: 
> DATA write fail: ޼ ۽ 
> DATA reponse fail  : κ ޼ 
> 

-- 
Bullseye!  Excellent shot, Maurice.
  Julian, King of Lemurs.

Re: [ccp4bb] statistical or systematic? bias or noise?

Pete,

On Mon, 2013-03-11 at 13:42 -0500, Pete Meyer wrote:
> My take on it is slightly different - the difference seems to be more
> on 
> how the source of error is modeled (although that may dictate changes
> to 
> the experiment) rather than essentially depending on how the
> experiment 
> was conducted.
> 
> Or (possibly) more clearly, systematic error is a result of the model
> of 
> the experiment incorrectly reflecting the actual experiment;
> measurement 
> error is due to living in a non-deterministic universe.

I see your point. 

I want to clarify that reproducing an experiment as far back as possible
is best.  Of course it's possible to design an experiment better and
account for pipetting errors.  The question is not whether it has to be
done (certainly yes) but whether pipetting error should be considered as
inaccuracy or imprecision when the experiment is not repeated.

One can say it's inaccuracy when it is not estimated and imprecision
when it is.  Or one can accept Ian's suggestion and notice that there is
no fundamental difference between things you can control and things you
can potentially control.

IIUC, you are saying that nature of the error should be independent of
my decision to model it or not.  Other words, if I can potentially
sample some additional random variable in my experiment, it contributes
to precision whether I do it or not.  When it's not sampled, the
precision is simply underestimated.  Does that make more sense?

Cheers,

Ed.

-- 
After much deep and profound brain things inside my head, 
I have decided to thank you for bringing peace to our home.
Julian, King of Lemurs

Re: [ccp4bb] statistical or systematic? bias or noise?

Ian,

thanks for the quick suggestion.

On Mon, 2013-03-11 at 18:34 +, Ian Tickle wrote:
> Personally I tend to avoid the systematic vs random error distinction
> and think instead in terms of controllable and uncontrollable errors:
> systematic errors are potentially under your control (given a
> particular experimental setup), whereas random errors aren't.
> 
Should you make a distinction then between controllable (cycling cuvette
in and out of the holder) and potentially controllable errors
(dilution)?  And the latter may then become controllable with a
different experimental setup?

Cheers,

Ed.

-- 
I don't know why the sacrifice thing didn't work.  
Science behind it seemed so solid.
Julian, King of Lemurs

Re: [ccp4bb] statistical or systematic? bias or noise?

Tim,

On Mon, 2013-03-11 at 18:51 +0100, Tim Gruene wrote:
> I don't share your opinion about a single measurement translating into
> a systematic error. I would call it a poorly designed experiment in
> case you were actually iterested in how accurately you determined the
> protein concentration.
> 
OK.  As I said, this is not about protein concentration, but let's say I
only have about 6ul of protein sample, so that I can only have *one*
dilution.  Would pipetting uncertainty then be considered systematic
error or statistical error?

I am afraid this is a matter of unsettled definitions.  By the way, it
wasn't an opinion, more of an option in interpretation.  I can say that
whatever is not sampled in a particular experimental setup is systematic
error.  Or I can say that (as you seem to suggest, and I like this
option better) that whenever there is a theoretical possibility of
sampling something, it is statistical error even though the particular
setup does not allow accounting for it.

Ed.

-- 
"Hurry up before we all come back to our senses!"
   Julian, King of Lemurs

[ccp4bb] statistical or systematic? bias or noise?

Salve,

I would like to solicit opinions on a certain question about the
relationship between statistical and systematic error. Please read and
consider the following in its entirety before commenting.

Statistical error (experiment precision) is determined by the degree to
which experimental measurement is reproducible. It is derived from
variance of the data when an experiment is repeated multiple times under
otherwise identical conditions. Statistical error is by its very nature
irremovable and originates from various sources of random noise, which
can be reduced but not entirely eliminated.

Systematic error (experiment accuracy) reflects degree to which precise
average deviates from a true value. Theoretically, corrections can be
introduced to the experimental method that eliminate various sources of
bias. Systematic error refers to some disconnect between the quantities
one tries to determine and what is actually measured.

The issue is whether the classification of various sources of error into
the two types depends on procedure. Let me explain using an example.

To determine the concentration of a protein stock, I derive extinction
coefficient from its sequence, dilute it 20x to and take OD measurement.
The OD value is then divided by extinction coefficient and inflated 20
times to calculate concentration.

So what is the statistical error of this when I am at the
spectrophotometer? I can cycle sample cuvette in and out of the holder
to correct for reproducibility of its position and instrument noise.
This gives me the estimated statistical error of the OD measurement.
Scaled by extinction coefficient and dilution factor, this number
corresponds to the statistical error (precision) of the protein
concentration.

There are two sources of the systematic error originating from the two
factors used to convert OD to concentration. First is irremovable
inaccuracy of the extinction coefficient. 

Second: dilution factor. Here main contribution to the systematic error
is pipetting. Importantly, this includes both systematic (pipettor
calibration) and statistical (pipetting precision) error. Notice that I
only prepared one sample, so if on that particular instance I picked up
4.8ul and not 5.0ul, this will translate into systematically
underestimating protein concentration, even though it could have equally
likely been 5.2ul.

So if pipetting error could have contributed ~4% into the overall
systematic error while the spectrophotometer measures with 0.1%
precision, it makes sense to consider how this systematic error can be
eliminated. The experiment can be modified to include multiple samples
prepared for OD determination from the same protein stock.

An interesting thing happens when I do that. What used to be a
systematic error of pipetting now becomes statistical error, because my
experiment now includes reproducing dilution of the stock. In a
nutshell,

Whether a particular source of error contributes to accuracy or
precision of an experiment depends on how experiment is conducted. 

And one more thing. No need to waste precious protein on evaluating
error of pipetting. I can determine that from a separate calibration
experiment using lysozyme solution of comparable concentration/surface
tension. Technically, a single measurement has accuracy of said 4%
(padded by whatever is error in extinction coefficient). But one can
also project that with actual dilution repeats, the precision would be
this same 4% (assuming that this is a dominant source of error).

So, is there anything wrong with this? Naturally, the question really is
not about extinction coefficients, but rather about semantics of what is
accuracy and what is precision and whether certain source of
experimental error is rigidly assigned to one of the two categories.
There is, of course, the wikipedia article on accuracy vs precision, and
section 3.1 from Ian's paper (ActaD 68:454) can be used as a point of
reference.

Cheers,

Ed.

-- 
Edwin Pozharski, PhD, Assistant Professor
University of Maryland, Baltimore
--
When the great Tao is abandoned, Ideas of "humanitarianism" and 
   "righteousness" appear.
When intellectualism arises It is accompanied by great hypocrisy.
When there is strife within a family Ideas of "brotherly love" appear.
When nation is plunged into chaos Politicians become "patriotic".
--   / Lao Tse /

[ccp4bb] qtrview command line options