Re: [ccp4bb] a challenge

2013-01-15 Thread Santosh Panjikar
Hi James,
The datasets frac.80.mtz to frac.100.mtz are challenging to solve using SAD 
phasing. However these datasets can be  easily solved using
other  experimental phasing method. Instead of using anomalous signal we could 
use isomorphous signal only. For example RIP or SIR 
phasing method, as there is a difference in intensity between the datasets due 
to scattering of S and Se. Since frac.80.mtz data contains 
20% selenium that is sufficient to solve the structure against the 
frac.100.mtz. It seems the structure can be solved even as less as 10% 
selenium content (frac.90.mtz vs frac.100.mtz), and substructure can be solved 
easily. This is not surprising, the pair of the datasets is 
quite isomorphous, . We phase all  reflections (centric and non-centric) where 
as  anomalous phasing we could phase non-centric reflections
 only. In fact, Single Isomorphous Replacement phasing method is the first 
phasing technique. This method has been further extended by 
Ravelli et al with some deviation by introduction of X-ray or UV RIP phasing.

I  tried RIP (SIR)  phasing protocol of Auto-Rickshaw using frac.90.mtz  as 
before and frac.100.mtz as after. Auto-Rickshaw used
SHELXC/D/E and ARP/wARP/REFMAC5  to get the partially refined model (Rfree 
below 30%) . 

Cheers
Santosh

Santosh Panjikar, Ph.D.
Scientist
Australian Synchrotron
800 Blackburn Road
Clayton VIC 3168
Australia
Ph: +61-4-67770851

From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of James Holton 
[jmhol...@lbl.gov]
Sent: Monday, January 14, 2013 8:12 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] a challenge

I am absolutely delighted at the response I have gotten to my little
John Henry Challenge!  Three people already have managed to do the
impossible.  Congratulations to George Sheldrick, Pavol Skubak and Raj
Pannu for finding ways to improve the phases over the ones I originally
obtained (using the default settings of mlphare and dm) and build their
way out of it.  This is quite useful information!  At least it is to me.

Nevertheless, I do think Frances Reyes has a point.  This was meant to
be a map interpretation challenge, and not a SAD-phasing challenge.  I
appreciate that the two are linked, but the reason I did not initially
provide the anomalous data is because I thought it would be too much to
ask people to re-do all the phasing, etc. Yes, there do appear to be
ways to improve the maps beyond the particular way I phased them, but no
matter how good your phasing program is, there will always be a level of
anomalous signal that will lead to phases that are off enough to make
building the model impossible.  Basically, once the map gets bad
enough that just as many wrong atoms get built in as right atoms,
then there is no escape.  However, I think human beings should still
have an advantage when it comes to pattern recognition, and I remain
curious to see if an insightful crystallographer can tip that balance in
the right direction.  I am also still curious to see if tweaking some
setting on some automated building program will do that too.  So, my
original question remains: are automated building programs better than
humans?  Any human?

I therefore declare the John Henry Challenge still open.


But yes, improving the phases can tip the balance too, and the accuracy
of the anomalous differences will ultimately affect the accuracy of the
phases, and so on.  This is a much broader challenge.  And I think the
best way to frame it is with the question:
How low can the anomalous signal be before any conceivable approach fails?
and perhaps:
What is the best procedure to use for weak anomalous signal?

  For those who are interested in joining George, Pavol, Raj and others
in this new challenge, the full spectrum of difficulty from trivial
(100% Se incorporation) to a complete waste of time (0% Se, 100% S) is here:
http://bl831.als.lbl.gov/~jamesh/challenge/occ_scan/

The impossible.mtz for the John Henry Map Interpretation Challenge was
derived from frac0.79.mtz and possible.mtz from frac0.78.mtz.
These simulated 31% and 32% Se incorporation into Met side chains
(respectively).  It has now been shown that both of these can be solved
automatically if you do the phasing right. But what about frac0.80.mtz?
Or frac0.90.mtz ?  At least on this one coordinate of Se
incorporation, the prowess of a particular approach can be given a
score.  For example, a score of 0.78 means that the indicated
procedure could solve the frac0.78.mtz dataset, but not the frac0.79.mtz
dataset.

Based on the reports I have gotten back so far, the difficulty score
lineup is:

score  method
0.86   xds, xscale, right sites, crank2 (Pavol Skubak)
0.78   xds, xscale, right sites, mlphare, dm, phenix.autobuild using 20
models (James Holton)
0.75   xds, xscale, right sites, mlphare, dm, buccaneer/refmac/dm (James
Holton)
0.71   xds, xscale, right sites, mlphare, dm, ARP/wARP 7.3 (James Holton)
0.51   xds, xscale, right

Re: [ccp4bb] a challenge

2013-01-15 Thread Savvas Savvides
Dear James

 I actually chose 3dko because it is a kinase (with a ligand), and
 therefore an interesting candidate for a molecular replacement
 score.  I have not set this up yet, but I think if you look for PDB
 entries that contain the word kinase and try to molecular-replace
 all of them into the 3dko dataset, what fraction of them will work?
 I think that fraction would make a good score for a given molecular
 replacement pipeline.

At the recent CCP4 SW in Nottingham Giovanna Scapin from Merck gave a talk on 
MR during which she reflected upon their attempts from some time ago to 
troubleshoot a recalcitrant MR case of a kinase by searching with hunderds of 
models derived from all kinase structures known at that time. However, I am not 
quite sure if they published these results anywhere (at least I could not fish 
out a relevant reference).

Along these lines, 'Wide Search MR' (Stokes-Rees and Sliz (2010) PNAS 107: 
21476-21481) and (www.sbgrid.org) may also provide some options to  establish 
such benchmarking or MR 'scores'.

Best regards
Savvas

 
 
 On Mon, Jan 14, 2013 at 2:31 PM, Nat Echols nathaniel.ech...@gmail.com 
 wrote:
 On Mon, Jan 14, 2013 at 11:18 AM, Tim Gruene t...@shelx.uni-ac.gwdg.de 
 wrote:
 I admit not having read all contributions to this thread. I understand
 the John Henry Challenge as whether there is an 'automated way of
 producing a model from impossible.mtz'. From looking at it and without
 having gone all the way to a PDB-file my feeling is one could without
 too much effort from the baton mode in e.g. coot.
 
 This should be even more possible if one also uses existing knowledge
 about the expected structure of the protein: a kinase domain is quite
 distinctive.  So, James, how much external information from homologous
 structures are we allowed to use?  Running Phaser would certainly be
 cheating, but if I take (for instance) a 25% identical kinase
 structure, manually align it to the map and/or a partial model, and
 use that as a guide to manually rebuild the target model, does that
 meet the terms of the challenge?
 
 -Nat


Re: [ccp4bb] a challenge

2013-01-15 Thread George M. Sheldrick
Dear Santosh,

I think that it is a bit more complicated. SIR generally provides a
stronger phasing signal than SAD, and can be better for phasing,
provided that:

(a) the native and derivative are sufficiently isomorphous, AND
(b) the heavy atom substructure is itself chiral.

For some space groups one site is enough to generate a chiral
substructure but for others, e.g.P21, more than one site is necessary.
Otherwise the first map will be a double image consisting of two
overlapping positive images, and density modification will not in
general be able to untangle them. SAD also gives a double image in such
cases, but then instead of two positive images we have one negative and
one positive image, and the simplest form of density modification -
setting negative density to zero - will break the pseudosymmetry. One
can also break such pseudosymmetry by using SIRAS or RIPAS instead of
SIR or RIP, even if the anomalous signal alone is not sufficient to
phase the structure.

If MAD doesn't work and one happens to have a native (Met) dataset as
well as SeMet, one should always consider analyzing the data as SIRAS.
Whether this is better than SAD on the SeMet data alone will depend
primarily on how isomorphous the two datasets are.

Best wishes, George


On 01/15/2013 11:06 AM, Santosh Panjikar wrote:
 Hi James,
 The datasets frac.80.mtz to frac.100.mtz are challenging to solve using SAD 
 phasing. However these datasets can be  easily solved using
 other  experimental phasing method. Instead of using anomalous signal we 
 could use isomorphous signal only. For example RIP or SIR 
 phasing method, as there is a difference in intensity between the datasets 
 due to scattering of S and Se. Since frac.80.mtz data contains 
 20% selenium that is sufficient to solve the structure against the 
 frac.100.mtz. It seems the structure can be solved even as less as 10% 
 selenium content (frac.90.mtz vs frac.100.mtz), and substructure can be 
 solved easily. This is not surprising, the pair of the datasets is 
 quite isomorphous, . We phase all  reflections (centric and non-centric) 
 where as  anomalous phasing we could phase non-centric reflections
  only. In fact, Single Isomorphous Replacement phasing method is the first 
 phasing technique. This method has been further extended by 
 Ravelli et al with some deviation by introduction of X-ray or UV RIP phasing.
 
 I  tried RIP (SIR)  phasing protocol of Auto-Rickshaw using frac.90.mtz  as 
 before and frac.100.mtz as after. Auto-Rickshaw used
 SHELXC/D/E and ARP/wARP/REFMAC5  to get the partially refined model (Rfree 
 below 30%) . 
 
 Cheers
 Santosh
 
 Santosh Panjikar, Ph.D.
 Scientist
 Australian Synchrotron
 800 Blackburn Road
 Clayton VIC 3168
 Australia
 Ph: +61-4-67770851
 
 From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of James Holton 
 [jmhol...@lbl.gov]
 Sent: Monday, January 14, 2013 8:12 PM
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] a challenge
 
 I am absolutely delighted at the response I have gotten to my little
 John Henry Challenge!  Three people already have managed to do the
 impossible.  Congratulations to George Sheldrick, Pavol Skubak and Raj
 Pannu for finding ways to improve the phases over the ones I originally
 obtained (using the default settings of mlphare and dm) and build their
 way out of it.  This is quite useful information!  At least it is to me.
 
 Nevertheless, I do think Frances Reyes has a point.  This was meant to
 be a map interpretation challenge, and not a SAD-phasing challenge.  I
 appreciate that the two are linked, but the reason I did not initially
 provide the anomalous data is because I thought it would be too much to
 ask people to re-do all the phasing, etc. Yes, there do appear to be
 ways to improve the maps beyond the particular way I phased them, but no
 matter how good your phasing program is, there will always be a level of
 anomalous signal that will lead to phases that are off enough to make
 building the model impossible.  Basically, once the map gets bad
 enough that just as many wrong atoms get built in as right atoms,
 then there is no escape.  However, I think human beings should still
 have an advantage when it comes to pattern recognition, and I remain
 curious to see if an insightful crystallographer can tip that balance in
 the right direction.  I am also still curious to see if tweaking some
 setting on some automated building program will do that too.  So, my
 original question remains: are automated building programs better than
 humans?  Any human?
 
 I therefore declare the John Henry Challenge still open.
 
 
 But yes, improving the phases can tip the balance too, and the accuracy
 of the anomalous differences will ultimately affect the accuracy of the
 phases, and so on.  This is a much broader challenge.  And I think the
 best way to frame it is with the question:
 How low can the anomalous signal be before any conceivable approach

Re: [ccp4bb] a challenge

2013-01-15 Thread James Holton

Santosh,

Although I appreciate your ingenuity and I agree that SIRAS is an 
excellent idea in the real world if you have only partial Se occupancy, 
I'm afraid I think it is cheating to use more than one of the 
challenge datasets at a time.  The scenario I wanted to test is the 
all-too-common we only had that one good crystal situation.


Then again, I do think it is interesting to ask how low the Se 
incorporation can go before SIRAS fails.  Even if it is under the 
idyllic perfect isomorphism situation here.  I have now put up 1% 
increments between frac0.90.mtz and frac1.00.mtz.  Do you think 
you/Autorickshaw can solve it with frac0.99.mtz vs frac0.1.00.mtz ?


If you'd like to test in the presence of non-isomorphism, I'd recommend 
using the radiation damaged simulated dataset here:

http://bl831.als.lbl.gov/~jamesh/workshop2/decaying.mtz
as the derivative.  It is about 18% different from frac0.00.mtz (100% 
Se, but badly decayed).


Thanks for all the great ideas!

-James Holton
MAD Scientist

On 1/15/2013 2:06 AM, Santosh Panjikar wrote:

Hi James,
The datasets frac.80.mtz to frac.100.mtz are challenging to solve using SAD 
phasing. However these datasets can be  easily solved using
other  experimental phasing method. Instead of using anomalous signal we could 
use isomorphous signal only. For example RIP or SIR
phasing method, as there is a difference in intensity between the datasets due 
to scattering of S and Se. Since frac.80.mtz data contains
20% selenium that is sufficient to solve the structure against the 
frac.100.mtz. It seems the structure can be solved even as less as 10%
selenium content (frac.90.mtz vs frac.100.mtz), and substructure can be solved 
easily. This is not surprising, the pair of the datasets is
quite isomorphous, . We phase all  reflections (centric and non-centric) where 
as  anomalous phasing we could phase non-centric reflections
  only. In fact, Single Isomorphous Replacement phasing method is the first 
phasing technique. This method has been further extended by
Ravelli et al with some deviation by introduction of X-ray or UV RIP phasing.

I  tried RIP (SIR)  phasing protocol of Auto-Rickshaw using frac.90.mtz  as before and 
frac.100.mtz as after. Auto-Rickshaw used
SHELXC/D/E and ARP/wARP/REFMAC5  to get the partially refined model (Rfree 
below 30%) .

Cheers
Santosh

Santosh Panjikar, Ph.D.
Scientist
Australian Synchrotron
800 Blackburn Road
Clayton VIC 3168
Australia
Ph: +61-4-67770851

From: CCP4 bulletin board [CCP4BB@JISCMAIL.AC.UK] On Behalf Of James Holton 
[jmhol...@lbl.gov]
Sent: Monday, January 14, 2013 8:12 PM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] a challenge

I am absolutely delighted at the response I have gotten to my little
John Henry Challenge!  Three people already have managed to do the
impossible.  Congratulations to George Sheldrick, Pavol Skubak and Raj
Pannu for finding ways to improve the phases over the ones I originally
obtained (using the default settings of mlphare and dm) and build their
way out of it.  This is quite useful information!  At least it is to me.

Nevertheless, I do think Frances Reyes has a point.  This was meant to
be a map interpretation challenge, and not a SAD-phasing challenge.  I
appreciate that the two are linked, but the reason I did not initially
provide the anomalous data is because I thought it would be too much to
ask people to re-do all the phasing, etc. Yes, there do appear to be
ways to improve the maps beyond the particular way I phased them, but no
matter how good your phasing program is, there will always be a level of
anomalous signal that will lead to phases that are off enough to make
building the model impossible.  Basically, once the map gets bad
enough that just as many wrong atoms get built in as right atoms,
then there is no escape.  However, I think human beings should still
have an advantage when it comes to pattern recognition, and I remain
curious to see if an insightful crystallographer can tip that balance in
the right direction.  I am also still curious to see if tweaking some
setting on some automated building program will do that too.  So, my
original question remains: are automated building programs better than
humans?  Any human?

I therefore declare the John Henry Challenge still open.


But yes, improving the phases can tip the balance too, and the accuracy
of the anomalous differences will ultimately affect the accuracy of the
phases, and so on.  This is a much broader challenge.  And I think the
best way to frame it is with the question:
How low can the anomalous signal be before any conceivable approach fails?
and perhaps:
What is the best procedure to use for weak anomalous signal?

   For those who are interested in joining George, Pavol, Raj and others
in this new challenge, the full spectrum of difficulty from trivial
(100% Se incorporation) to a complete waste of time (0% Se, 100% S) is here:
http://bl831.als.lbl.gov

Re: [ccp4bb] a challenge

2013-01-14 Thread James Holton
I am absolutely delighted at the response I have gotten to my little 
John Henry Challenge!  Three people already have managed to do the 
impossible.  Congratulations to George Sheldrick, Pavol Skubak and Raj 
Pannu for finding ways to improve the phases over the ones I originally 
obtained (using the default settings of mlphare and dm) and build their 
way out of it.  This is quite useful information!  At least it is to me.


Nevertheless, I do think Frances Reyes has a point.  This was meant to 
be a map interpretation challenge, and not a SAD-phasing challenge.  I 
appreciate that the two are linked, but the reason I did not initially 
provide the anomalous data is because I thought it would be too much to 
ask people to re-do all the phasing, etc. Yes, there do appear to be 
ways to improve the maps beyond the particular way I phased them, but no 
matter how good your phasing program is, there will always be a level of 
anomalous signal that will lead to phases that are off enough to make 
building the model impossible.  Basically, once the map gets bad 
enough that just as many wrong atoms get built in as right atoms, 
then there is no escape.  However, I think human beings should still 
have an advantage when it comes to pattern recognition, and I remain 
curious to see if an insightful crystallographer can tip that balance in 
the right direction.  I am also still curious to see if tweaking some 
setting on some automated building program will do that too.  So, my 
original question remains: are automated building programs better than 
humans?  Any human?


I therefore declare the John Henry Challenge still open.


But yes, improving the phases can tip the balance too, and the accuracy 
of the anomalous differences will ultimately affect the accuracy of the 
phases, and so on.  This is a much broader challenge.  And I think the 
best way to frame it is with the question:

How low can the anomalous signal be before any conceivable approach fails?
and perhaps:
What is the best procedure to use for weak anomalous signal?

 For those who are interested in joining George, Pavol, Raj and others 
in this new challenge, the full spectrum of difficulty from trivial 
(100% Se incorporation) to a complete waste of time (0% Se, 100% S) is here:

http://bl831.als.lbl.gov/~jamesh/challenge/occ_scan/

The impossible.mtz for the John Henry Map Interpretation Challenge was 
derived from frac0.79.mtz and possible.mtz from frac0.78.mtz.  
These simulated 31% and 32% Se incorporation into Met side chains 
(respectively).  It has now been shown that both of these can be solved 
automatically if you do the phasing right. But what about frac0.80.mtz?  
Or frac0.90.mtz ?  At least on this one coordinate of Se 
incorporation, the prowess of a particular approach can be given a 
score.  For example, a score of 0.78 means that the indicated 
procedure could solve the frac0.78.mtz dataset, but not the frac0.79.mtz 
dataset.


Based on the reports I have gotten back so far, the difficulty score 
lineup is:


score  method
0.86   xds, xscale, right sites, crank2 (Pavol Skubak)
0.78   xds, xscale, right sites, mlphare, dm, phenix.autobuild using 20 
models (James Holton)
0.75   xds, xscale, right sites, mlphare, dm, buccaneer/refmac/dm (James 
Holton)

0.71   xds, xscale, right sites, mlphare, dm, ARP/wARP 7.3 (James Holton)
0.51   xds, xscale, right sites, mlphare, dm, ARP/wARP 6.1.1 (James Holton)

Note that all of these attempts cheated on the sites.  Finding the 
sites seems to be harder than solving the structure once you've got 
them.  That lineup is:


score  method
0.82   cheating: xds, xscale, right phases, anomalous difference Fourier 
(James Holton)

0.79   xds, xscale, shelxc/d/e 3.5A NTRY=1 (George Sheldrick)
0.74   xds, autorickshaw (Santosh Panjikar)
0.65xds, xscale, phenix.hyss --search=full (James Holton)
0.60   xds, xscale, shelxc/d with NTRY=100 (James Holton)

Where again the score is the dataset where the heavy atom site 
constellation found is close enough to the right one to move forward.  
This transition, like the model-building one, is remarkably sharp, 
particularly if you let each step run for a lot of cycles.  The graph 
for model-building is here:

http://bl831.als.lbl.gov/~jamesh/challenge/build_CC_vs_frac.png
Note how the final map quality is pretty much independent of the initial 
map quality, up to the point where it all goes wrong.  I think this 
again is an example of the solution needing to be at least half right 
before it can be improved.  But perhaps someone can prove me wrong on 
that one?


For those who want the unmerged data, I have all the XDS_ASCII.HKL files 
here:

http://bl831.als.lbl.gov/~jamesh/challenge/occ_scan/XDS_ASCII.tgz

If you'd like to go all the way back to the images, you can get them 
from here:

http://bl831.als.lbl.gov/~jamesh/workshop2/
the badsignal dataset is what produced frac1.00.mtz, and goodsignal 
produced frac0.00.mtz.  You can generate anything in 

Re: [ccp4bb] a challenge

2013-01-14 Thread Bosch, Juergen
What is the best procedure to use for weak anomalous signal

That opens up the can of worms which I'm happy to jump into.
We've had very good success in the years 2003-2009 with shelx for finding sites 
(sometimes more than 1 trials) then force feeding them to sharp for phase 
improvement. We should also say most of the times in particular in the more 
difficult cases xds made the difference in detectable anomalous signal.

And no we still have not published this. With we I mean Marc Robien and myself 
during our SGPP times.

Jürgen 
..
Jürgen Bosch
Johns Hopkins Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Phone: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-3655
http://lupo.jhsph.edu

On Jan 14, 2013, at 3:13, James Holton jmhol...@lbl.gov wrote:

 What is the best procedure to use for weak anomalous signal


Re: [ccp4bb] a challenge

2013-01-14 Thread Tim Gruene
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello James and all other contributors,

I admit not having read all contributions to this thread. I understand
the John Henry Challenge as whether there is an 'automated way of
producing a model from impossible.mtz'. From looking at it and without
having gone all the way to a PDB-file my feeling is one could without
too much effort from the baton mode in e.g. coot.

I guess this is not what you (and this thread) mean by 'automated'
which leaves the impression that crystallographers have become quite
spoiled children for this notion undermines how much effort and
ingenuity the authors of programs like coot, O, mifit, frodo, etc,
etc, pp - compared how models were prepared before this algorithms had
been implemented, there is a lot of automation even in looking at the
skeleton of a map!

Cheers,
Tim


On 01/14/2013 10:12 AM, James Holton wrote:
 I am absolutely delighted at the response I have gotten to my
 little John Henry Challenge!  Three people already have managed
 to do the impossible.  Congratulations to George Sheldrick, Pavol
 Skubak and Raj Pannu for finding ways to improve the phases over
 the ones I originally obtained (using the default settings of
 mlphare and dm) and build their way out of it.  This is quite
 useful information!  At least it is to me.
 
 Nevertheless, I do think Frances Reyes has a point.  This was meant
 to be a map interpretation challenge, and not a SAD-phasing
 challenge.  I appreciate that the two are linked, but the reason I
 did not initially provide the anomalous data is because I thought
 it would be too much to ask people to re-do all the phasing, etc.
 Yes, there do appear to be ways to improve the maps beyond the
 particular way I phased them, but no matter how good your phasing
 program is, there will always be a level of anomalous signal that
 will lead to phases that are off enough to make building the
 model impossible.  Basically, once the map gets bad enough that
 just as many wrong atoms get built in as right atoms, then
 there is no escape.  However, I think human beings should still 
 have an advantage when it comes to pattern recognition, and I
 remain curious to see if an insightful crystallographer can tip
 that balance in the right direction.  I am also still curious to
 see if tweaking some setting on some automated building program
 will do that too.  So, my original question remains: are automated
 building programs better than humans?  Any human?
 
 I therefore declare the John Henry Challenge still open.
 
 
 But yes, improving the phases can tip the balance too, and the
 accuracy of the anomalous differences will ultimately affect the
 accuracy of the phases, and so on.  This is a much broader
 challenge.  And I think the best way to frame it is with the
 question: How low can the anomalous signal be before any
 conceivable approach fails? and perhaps: What is the best
 procedure to use for weak anomalous signal?
 
 For those who are interested in joining George, Pavol, Raj and
 others in this new challenge, the full spectrum of difficulty
 from trivial (100% Se incorporation) to a complete waste of time
 (0% Se, 100% S) is here: 
 http://bl831.als.lbl.gov/~jamesh/challenge/occ_scan/
 
 The impossible.mtz for the John Henry Map Interpretation
 Challenge was derived from frac0.79.mtz and possible.mtz from
 frac0.78.mtz. These simulated 31% and 32% Se incorporation into
 Met side chains (respectively).  It has now been shown that both of
 these can be solved automatically if you do the phasing right. But
 what about frac0.80.mtz? Or frac0.90.mtz ?  At least on this one
 coordinate of Se incorporation, the prowess of a particular
 approach can be given a score.  For example, a score of 0.78
 means that the indicated procedure could solve the frac0.78.mtz
 dataset, but not the frac0.79.mtz dataset.
 
 Based on the reports I have gotten back so far, the difficulty
 score lineup is:
 
 score  method 0.86   xds, xscale, right sites, crank2 (Pavol
 Skubak) 0.78   xds, xscale, right sites, mlphare, dm,
 phenix.autobuild using 20 models (James Holton) 0.75   xds, xscale,
 right sites, mlphare, dm, buccaneer/refmac/dm (James Holton) 0.71
 xds, xscale, right sites, mlphare, dm, ARP/wARP 7.3 (James Holton) 
 0.51   xds, xscale, right sites, mlphare, dm, ARP/wARP 6.1.1 (James
 Holton)
 
 Note that all of these attempts cheated on the sites.  Finding
 the sites seems to be harder than solving the structure once you've
 got them.  That lineup is:
 
 score  method 0.82   cheating: xds, xscale, right phases, anomalous
 difference Fourier (James Holton) 0.79   xds, xscale, shelxc/d/e
 3.5A NTRY=1 (George Sheldrick) 0.74   xds, autorickshaw
 (Santosh Panjikar) 0.65xds, xscale, phenix.hyss --search=full
 (James Holton) 0.60   xds, xscale, shelxc/d with NTRY=100 (James
 Holton)
 
 Where again the score is the dataset where the heavy atom site 
 constellation found is close enough to the right one to move
 

Re: [ccp4bb] a challenge

2013-01-14 Thread Nat Echols
On Mon, Jan 14, 2013 at 11:18 AM, Tim Gruene t...@shelx.uni-ac.gwdg.de wrote:
 I admit not having read all contributions to this thread. I understand
 the John Henry Challenge as whether there is an 'automated way of
 producing a model from impossible.mtz'. From looking at it and without
 having gone all the way to a PDB-file my feeling is one could without
 too much effort from the baton mode in e.g. coot.

This should be even more possible if one also uses existing knowledge
about the expected structure of the protein: a kinase domain is quite
distinctive.  So, James, how much external information from homologous
structures are we allowed to use?  Running Phaser would certainly be
cheating, but if I take (for instance) a 25% identical kinase
structure, manually align it to the map and/or a partial model, and
use that as a guide to manually rebuild the target model, does that
meet the terms of the challenge?

-Nat


Re: [ccp4bb] a challenge

2013-01-14 Thread James Holton
I actually chose 3dko because it is a kinase (with a ligand), and
therefore an interesting candidate for a molecular replacement
score.  I have not set this up yet, but I think if you look for PDB
entries that contain the word kinase and try to molecular-replace
all of them into the 3dko dataset, what fraction of them will work?
I think that fraction would make a good score for a given molecular
replacement pipeline.

But, if you want to bootstrap S-SAD phasing with a homolog, then I'd
say its definitely cheating if you use a homolog close enough to
build your way out of the resulting density without any anomalous
information at all.

Perhaps the fairest way to do this would be to make a 2-dimensional
score?  The frac of the dataset you used, plus the BLAST2 E-value of
the model you started with vs the 3dko sequence?

-James Holton
MAD Scientist


On Mon, Jan 14, 2013 at 2:31 PM, Nat Echols nathaniel.ech...@gmail.com wrote:
 On Mon, Jan 14, 2013 at 11:18 AM, Tim Gruene t...@shelx.uni-ac.gwdg.de 
 wrote:
 I admit not having read all contributions to this thread. I understand
 the John Henry Challenge as whether there is an 'automated way of
 producing a model from impossible.mtz'. From looking at it and without
 having gone all the way to a PDB-file my feeling is one could without
 too much effort from the baton mode in e.g. coot.

 This should be even more possible if one also uses existing knowledge
 about the expected structure of the protein: a kinase domain is quite
 distinctive.  So, James, how much external information from homologous
 structures are we allowed to use?  Running Phaser would certainly be
 cheating, but if I take (for instance) a 25% identical kinase
 structure, manually align it to the map and/or a partial model, and
 use that as a guide to manually rebuild the target model, does that
 meet the terms of the challenge?

 -Nat


Re: [ccp4bb] a challenge

2013-01-13 Thread George Sheldrick


I have now looked at James's two challenges to see what I could learn 
from them, and will try to give enough details so that less experienced 
readers of this list can repeat what I did and apply the experience 
thereby gained to solving their own structures. For those who are not 
interested in the details, the bottom line is that SHELXC/D/E can solve 
both 'possible' and 'impossible' almost routinely, starting by finding 
the substructure, without using any information derived from the known 
structure. It should be emphasised that this does not produce a fully 
refined structure, but the resulting poly-Ala trace of about 70% of the 
structure and 'free lunch' maps showing many side-chains would be a good 
starting point for programs (such as Buccaneer or wARP) that dock a 
known sequence and complete the structure. My students would of course 
be expected to complete the map interpretation themselves using the 
excellent facilities available in Coot, that is always very educational!


I used the current SHELX beta-test programs that will shortly be 
released as the official versions.


First i used Tim Gruene's mtz2sca to convert James's mtz files into a 
format that SHELX can read, and then ran SHELXC from the command line to 
make the files possible.hkl (native intensity data), possible_fa.hkl (h 
k l FA and phase shift alpha) and possible_fa.ins (input file to run 
SHELXD (and the same for 'impossible'). Alternatively I could have used 
Thomas Schneider's hkl2map GUI to call SHELXC/D/E. I looked at the 
d/sig row to see where to cut the resolution for finding the heavy 
atoms and decided on 3.5A (SHEL 999 3.5). If I had been able to input 
unmerged data to SHELXC, e.g. as XDS_ASCII.HKL which is always unmerged, 
I would also have obtained a CC1/2 value that would also indicate where 
to cut the resolution. 3.5A corresponded to d/sig of about 1.0 which 
is still rather low, but cutting at even lower resolution tends to give 
less accurate substructures. To compensate for this optimistic choice 
for the rather weak anomalous data, I increased the number of trials 
(NTRY) to 1. These are the two most critical parameters for SHELXD, 
and as it turns out, for the whole structure solution.


However before running the multi-CPU version of SHELXD, since the PDB 
file of the refined structure was available, I ran AnoDe to use the PDB 
file and anomalous data in possible_fa.hkl to check the substructure. 
This told me that for both 'possible' and 'impossible' it should be 
possible to find 12 well-defined sites, and also that the original 
impossible.mtz was inconsistently indexed. AnoDe also outputs a list of 
heavy atoms in SHELX format that can be input directly into SHELXE for 
density modification and tracing. However that would be cheating because 
AnoDe reads the final PDB file to calculate the anomalous density, and I 
was trying to solve the structure without assuming the answer, even 
indirectly. In general a substructure calculated in this way by AnoDe is 
always much more accurate and complete that one found ab initio from the 
anomalous data.


The best SHELXD solutions had CC 34.6 and CCweak 15.0 for 'possible' and 
28.4/13.2 for 'impossible'. I always tell people to aim for at least 
30/15, so maybe I should have done more than 1 tries for 
'impossible' but my wife was getting impatient (I had promised her that 
we could go for a walk in the snow) so I accepted it. I looked at the 
peaklist from SHELXD pretending not to know that there should be 12 
sites. There was a bit of a gap in peakheight 0.53/0.42 between peaks 11 
and 12 for 'possible' and 0.53/0.45 between peaks 10 and 11 for 
'impossible', so for SHELXE I used -h11 and -h10 respectively. However I 
also used the new -z option that refines the substructure before 
starting on the phasing, and as it turns out that increased the number 
of heavy atoms to 12 in both cases and as it happens all 12 were correct 
in both cases. I started shelxe with:


shelxe possible possible_fa -s0.55 -a30 -h11 -z -q -e1

and similarly for 'impossible'. I was expecting problems so I did 30 
cycles autotracing, normally 3 would be enough. I just guessed the 
solvent content (-s0.55), maybe that could be fine-tuned. For SHELXE, 
there is a remarkably consistent rule that if the CC for the trace 
against the native data gets above 25%, the structure is solved. For 
'possible' this happened after 25 tracing cycles, and the final 'free 
lunch' map (-e1) was indeed convincing. However 'impossible' only 
reached a CC of 17% and although the map did not look completely wrong, 
I would not have been able to interpret it. So I changed one default 
parameter (-m30), increasing the number of density modification cycles 
to compensate for the poor starting phases, and ran the job again. CC 
reached 25% after 16 cycles and produced an excellent map and trace. 
Almost certainly, 'possible' would also benefit from the change, but it 
was solved anyway. As Tom 

Re: [ccp4bb] a challenge

2013-01-13 Thread Francis E Reyes
Ok, I'll bite.

I dare anyone who considers themself an expert macromolecular crystallographer 
to find a way to build out of this map.

I put emphasis on this map. 

Short of actually cheating (see below), there doesn't seem to be any automated 
way to arrive at a solved structure from these phases

I put emphasis on these phases. 

I think the real challenge (and one that makes for an excellent macromolecular 
crystallographer) is how well one can interpret a map with poor phases. 

That being said, I think a recalculation of the map using any other information 
besides the map itself should not be allowed. 

PS. I'd like to see what the pre-DM phases look like. There's a huge chunk of 
the protein that is completely flattened out in impossible.mtz .

F




On Jan 12, 2013, at 1:50 PM, James Holton jmhol...@lbl.gov wrote:

 
 Woops!  sorry folks.  I made a mistake with the I(+)/I(-) entry.  They had 
 the wrong axis convention relative to 3dko and the F in the same file.  Sorry 
 about that.
 
 The files on the website now should be right.
 http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
 http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz
 
 md5 sums:
 c4bdb32a08c884884229e8080228d166  impossible.mtz
 caf05437132841b595be1c0dc1151123  possible.mtz
 
 -James Holton
 MAD Scientist
 
 On 1/12/2013 8:25 AM, James Holton wrote:
 
 Fair enough!
 
 I have just now added DANO  and I(+)/I(-) to the files.  I'll be very 
 interested to see what you can come up with!  For the record, the phases 
 therein came from running mlphare with default parameters but exactly the 
 correct heavy-atom constellation (all the sulfur atoms in 3dko), and then 
 running dm with default parameters.  
 
 Yes, there are other ways to run mlphare and dm that give better phases, but 
 I was only able to determine those parameters by cheating (comparing the 
 resulting map to the right answer), so I don't think it is fair to use 
 those maps.
 
 I have had a few questions about what is cheating and what is not 
 cheating.  I don't have a problem with the use of sequence information 
 because that actually is something that you realistically would know about 
 your protein when you sat down to collect data.  The sequence of this 
 molecule is that of 3dko:
 http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir
 
   I also don't have a problem with anyone actually using an automation 
 program to _help_ them solve the impossible dataset as long as they can 
 explain what they did.  Simply putting the above sequence into BALBES would, 
 of course, be cheating!  I suppose one could try eliminating 3dko and its 
 homologs from the BALBES search, but that, in and of itself, is perhaps 
 relevant to the challenge: what is the most distance homolog that still 
 allows you to solve the structure?.  That, I think, is also a stringent 
 test of model-building skill.  
 
   I have already tried ARP/wARP, phenix.autobuild and buccaneer/refmac.  
 With default parameters, all of these programs fail on both the possible 
 and impossible datasets.  It was only with some substantial tweaking that 
 I found a way to get phenix.autobuild to crack the possible dataset (using 
 20 models in parallel).  I have not yet found a way to get any automation 
 program to build its way out of the impossible dataset.   Personally, I 
 think that the breakthrough might be something like what Tom Terwilliger 
 mentioned.  If you build a good enough starting set of atoms, then I think 
 an automation program should be able to take you the rest of the way.  If 
 that is the case, then it means people like Tom who develop such programs 
 for us might be able to use that insight to improve the software, and that 
 is something that will benefit all of us.
 
 Or, it is entirely possible that I'm just not running the current software 
 properly!  If so, I'd love it if someone who knows better (such as their 
 developers) could enlighten me.
 
 -James Holton
 MAD Scientist
 
 On 1/12/2013 3:07 AM, Pavol Skubak wrote:
 
 Dear James,
 
 your challenge in its current form ignores an important source
 of information for model building that is available for your 
 simulated data - namely, it does not allow to use anomalous 
 phase information in the model building. In difficult cases on 
 the edge of success such as this one, this typically makes 
 the difference between building and not building. 
 
 If you can make the F+/F- and Se substructure available, we 
 can test whether this is the case indeed. However, while I 
 expect this would push the challenge further significantly, 
 most likely you would be able to decrease the Se incorporation 
 of your simulated data further to such levels that the anomalous 
 signal is again no longer sufficient to build the structure. And
 most likely, there would again exist an edge where a small 
 decrease in the Se incorporation would lead from a model built
 to no model built.
 
 Best regards,
 
 -- 
 Pavol Skubak
 Biophysical 

Re: [ccp4bb] a challenge

2013-01-13 Thread Anastassis Perrakis
 I think the real challenge (and one that makes for an excellent 
 macromolecular crystallographer) is how well one can interpret a map with 
 poor phases. 

Let me disagree ... An excellent macromolecular crystallographer, is one that 
given some crystals can derive the best strategy to collect data,
process the data optimally, derive phases using all available information, 
build a model and refine it in such a way that it best explains both data
and geometrical expectations, and do these as efficiently as possible.

Efficiency may suggest using one automated suite or another - or indeed may 
best be achieved by manual labor - be it in the map or in data
collection strategy or refinement or another step: and here I am ignoring the 
art of transforming hair-needle-crystalline-like-dingbits to a diffracting 
crystal.

One that can interpret a map with poor phases can be either a genius in 3d 
orientation - or a not necessarily too intelligent nor experienced but 
determined student 
that can drink and breathe this map for a few weeks in a row until a solution 
is in place. Neither would make an excellent macromolecular crystallographer by 
necessity.

Tassos

Re: [ccp4bb] a challenge

2013-01-13 Thread jens Preben Morth
I agree with Tassos, and btw think that this crystallographer, should be 
able to go back into the lab and optimize the present crystal conditions 
to get better crystals. In particularly, when he or she realize that the 
scientific question they set out to investigate cannot be answered, by 
analyzing the final structure, with the available data quality.

Preben


On 1/13/13 8:52 PM, Anastassis Perrakis wrote:

I think the real challenge (and one that makes for an excellent macromolecular 
crystallographer) is how well one can interpret a map with poor phases.

Let me disagree ... An excellent macromolecular crystallographer, is one that 
given some crystals can derive the best strategy to collect data,
process the data optimally, derive phases using all available information, 
build a model and refine it in such a way that it best explains both data
and geometrical expectations, and do these as efficiently as possible.

Efficiency may suggest using one automated suite or another - or indeed may 
best be achieved by manual labor - be it in the map or in data
collection strategy or refinement or another step: and here I am ignoring the 
art of transforming hair-needle-crystalline-like-dingbits to a diffracting 
crystal.

One that can interpret a map with poor phases can be either a genius in 3d 
orientation - or a not necessarily too intelligent nor experienced but 
determined student
that can drink and breathe this map for a few weeks in a row until a solution 
is in place. Neither would make an excellent macromolecular crystallographer by 
necessity.

Tassos


Re: [ccp4bb] a challenge

2013-01-13 Thread Demetres D. Leonidas
Since the discussion for crystallographers is fired up. I want to put on 
record that I totally agree with Tassos about the profile of a 
crystallographer. If you take away the crystals, then a crystallographer 
is no long a crystallographer.


Demetres


On 13/1/2013 9:52 μμ, Anastassis Perrakis wrote:

I think the real challenge (and one that makes for an excellent macromolecular 
crystallographer) is how well one can interpret a map with poor phases.

Let me disagree ... An excellent macromolecular crystallographer, is one that 
given some crystals can derive the best strategy to collect data,
process the data optimally, derive phases using all available information, 
build a model and refine it in such a way that it best explains both data
and geometrical expectations, and do these as efficiently as possible.

Efficiency may suggest using one automated suite or another - or indeed may 
best be achieved by manual labor - be it in the map or in data
collection strategy or refinement or another step: and here I am ignoring the 
art of transforming hair-needle-crystalline-like-dingbits to a diffracting 
crystal.

One that can interpret a map with poor phases can be either a genius in 3d 
orientation - or a not necessarily too intelligent nor experienced but 
determined student
that can drink and breathe this map for a few weeks in a row until a solution 
is in place. Neither would make an excellent macromolecular crystallographer by 
necessity.

Tassos


--
---
Dr. Demetres D. Leonidas
Associate Professor of Biochemistry
Department of Biochemistry  Biotechnology
University of Thessaly
26 Ploutonos Str.
41221 Larissa, Greece
-
Tel. +302410 565278
Tel. +302410 565297 (Lab)
Fax. +302410 565290
E-mail: ddleoni...@bio.uth.gr
http://www.bio.uth.gr
---


Re: [ccp4bb] a challenge

2013-01-12 Thread Pavol Skubak
Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

-- 
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0


Re: [ccp4bb] a challenge

2013-01-12 Thread George Sheldrick

Dear James,

I agree with Pavel that your example is not very realistic. In practice
one would start from the heavy atom positions. As well as providing
starting phases, they are useful in other ways. For example. shelxe
(and probably most other tracing programs) adds them to a 'no-go'
map so it knows where NOT to trace the main-chain.

Best wishes, George


Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

--
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414 tel:0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0



--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582




Re: [ccp4bb] a challenge

2013-01-12 Thread James Holton


Fair enough!

I have just now added DANO  and I(+)/I(-) to the files.  I'll be very 
interested to see what you can come up with!  For the record, the phases 
therein came from running mlphare with default parameters but exactly 
the correct heavy-atom constellation (all the sulfur atoms in 3dko), and 
then running dm with default parameters.


Yes, there are other ways to run mlphare and dm that give better phases, 
but I was only able to determine those parameters by cheating 
(comparing the resulting map to the right answer), so I don't think it 
is fair to use those maps.


I have had a few questions about what is cheating and what is not 
cheating.  I don't have a problem with the use of sequence information 
because that actually is something that you realistically would know 
about your protein when you sat down to collect data.  The sequence of 
this molecule is that of 3dko:

http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir

  I also don't have a problem with anyone actually using an automation 
program to _help_ them solve the impossible dataset as long as they 
can explain what they did.  Simply putting the above sequence into 
BALBES would, of course, be cheating!  I suppose one could try 
eliminating 3dko and its homologs from the BALBES search, but that, in 
and of itself, is perhaps relevant to the challenge: what is the most 
distance homolog that still allows you to solve the structure?.  That, 
I think, is also a stringent test of model-building skill.


  I have already tried ARP/wARP, phenix.autobuild and 
buccaneer/refmac.  With default parameters, all of these programs fail 
on both the possible and impossible datasets.  It was only with some 
substantial tweaking that I found a way to get phenix.autobuild to crack 
the possible dataset (using 20 models in parallel).  I have not yet 
found a way to get any automation program to build its way out of the 
impossible dataset. Personally, I think that the breakthrough might be 
something like what Tom Terwilliger mentioned.  If you build a good 
enough starting set of atoms, then I think an automation program should 
be able to take you the rest of the way.  If that is the case, then it 
means people like Tom who develop such programs for us might be able to 
use that insight to improve the software, and that is something that 
will benefit all of us.


Or, it is entirely possible that I'm just not running the current 
software properly!  If so, I'd love it if someone who knows better (such 
as their developers) could enlighten me.


-James Holton
MAD Scientist

On 1/12/2013 3:07 AM, Pavol Skubak wrote:


Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

--
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414 tel:0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0




Re: [ccp4bb] a challenge

2013-01-12 Thread James Holton


Fair enough!

The heavy atom positions are simply the S atoms in 3dko.  There are 22 
of them.  Also, in this case the Met side chains (12 of those) are 32% 
occupied with Se.  The other 68% is sulfur.   I think it is realistic 
that one could know the extent of Se incorporation ahead of time from 
something like mass spec (especially if you knew it could make-or-break 
your structure determination).  However, I don't think it is realistic 
that you would know where they are before running shelx.


-James Holton
MAD Scientist

On 1/12/2013 7:46 AM, George Sheldrick wrote:

Dear James,

I agree with Pavel that your example is not very realistic. In practice
one would start from the heavy atom positions. As well as providing
starting phases, they are useful in other ways. For example. shelxe
(and probably most other tracing programs) adds them to a 'no-go'
map so it knows where NOT to trace the main-chain.

Best wishes, George


Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

--
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414 tel:0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0



--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582





Re: [ccp4bb] a challenge

2013-01-12 Thread James Holton


Woops!  sorry folks.  I made a mistake with the I(+)/I(-) entry. They 
had the wrong axis convention relative to 3dko and the F in the same 
file.  Sorry about that.


The files on the website now should be right.
http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz

md5 sums:
c4bdb32a08c884884229e8080228d166  impossible.mtz
caf05437132841b595be1c0dc1151123  possible.mtz

-James Holton
MAD Scientist

On 1/12/2013 8:25 AM, James Holton wrote:


Fair enough!

I have just now added DANO  and I(+)/I(-) to the files.  I'll be very 
interested to see what you can come up with!  For the record, the 
phases therein came from running mlphare with default parameters but 
exactly the correct heavy-atom constellation (all the sulfur atoms in 
3dko), and then running dm with default parameters.


Yes, there are other ways to run mlphare and dm that give better 
phases, but I was only able to determine those parameters by 
cheating (comparing the resulting map to the right answer), so I 
don't think it is fair to use those maps.


I have had a few questions about what is cheating and what is not 
cheating.  I don't have a problem with the use of sequence information 
because that actually is something that you realistically would know 
about your protein when you sat down to collect data.  The sequence of 
this molecule is that of 3dko:

http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir

  I also don't have a problem with anyone actually using an automation 
program to _help_ them solve the impossible dataset as long as they 
can explain what they did.  Simply putting the above sequence into 
BALBES would, of course, be cheating!  I suppose one could try 
eliminating 3dko and its homologs from the BALBES search, but that, 
in and of itself, is perhaps relevant to the challenge: what is the 
most distance homolog that still allows you to solve the structure?.  
That, I think, is also a stringent test of model-building skill.


  I have already tried ARP/wARP, phenix.autobuild and 
buccaneer/refmac.  With default parameters, all of these programs fail 
on both the possible and impossible datasets. It was only with 
some substantial tweaking that I found a way to get phenix.autobuild 
to crack the possible dataset (using 20 models in parallel).  I have 
not yet found a way to get any automation program to build its way out 
of the impossible dataset.   Personally, I think that the 
breakthrough might be something like what Tom Terwilliger mentioned.  
If you build a good enough starting set of atoms, then I think an 
automation program should be able to take you the rest of the way.  If 
that is the case, then it means people like Tom who develop such 
programs for us might be able to use that insight to improve the 
software, and that is something that will benefit all of us.


Or, it is entirely possible that I'm just not running the current 
software properly!  If so, I'd love it if someone who knows better 
(such as their developers) could enlighten me.


-James Holton
MAD Scientist

On 1/12/2013 3:07 AM, Pavol Skubak wrote:


Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

--
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414 tel:0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0






Re: [ccp4bb] a challenge

2013-01-12 Thread George Sheldrick

James,

I had in fact just come to the conclusion that the indexing was 
consistent with 3dko for 'possible' but not for 'impossible',

which I suppose was logical.

George

Woops!  sorry folks.  I made a mistake with the I(+)/I(-) entry.  They 
had the wrong axis convention relative to 3dko and the F in the same 
file.  Sorry about that.


The files on the website now should be right.
http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz

md5 sums:
c4bdb32a08c884884229e8080228d166  impossible.mtz
caf05437132841b595be1c0dc1151123  possible.mtz

-James Holton
MAD Scientist

On 1/12/2013 8:25 AM, James Holton wrote:


Fair enough!

I have just now added DANO  and I(+)/I(-) to the files.  I'll be very 
interested to see what you can come up with!  For the record, the 
phases therein came from running mlphare with default parameters but 
exactly the correct heavy-atom constellation (all the sulfur atoms in 
3dko), and then running dm with default parameters.


Yes, there are other ways to run mlphare and dm that give better 
phases, but I was only able to determine those parameters by 
cheating (comparing the resulting map to the right answer), so I 
don't think it is fair to use those maps.


I have had a few questions about what is cheating and what is not 
cheating.  I don't have a problem with the use of sequence 
information because that actually is something that you realistically 
would know about your protein when you sat down to collect data.  The 
sequence of this molecule is that of 3dko:

http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir

  I also don't have a problem with anyone actually using an 
automation program to _help_ them solve the impossible dataset as 
long as they can explain what they did.  Simply putting the above 
sequence into BALBES would, of course, be cheating!  I suppose one 
could try eliminating 3dko and its homologs from the BALBES search, 
but that, in and of itself, is perhaps relevant to the challenge: 
what is the most distance homolog that still allows you to solve the 
structure?.  That, I think, is also a stringent test of 
model-building skill.


  I have already tried ARP/wARP, phenix.autobuild and 
buccaneer/refmac.  With default parameters, all of these programs 
fail on both the possible and impossible datasets.  It was only 
with some substantial tweaking that I found a way to get 
phenix.autobuild to crack the possible dataset (using 20 models in 
parallel).  I have not yet found a way to get any automation program 
to build its way out of the impossible dataset.   Personally, I 
think that the breakthrough might be something like what Tom 
Terwilliger mentioned.  If you build a good enough starting set of 
atoms, then I think an automation program should be able to take you 
the rest of the way.  If that is the case, then it means people like 
Tom who develop such programs for us might be able to use that 
insight to improve the software, and that is something that will 
benefit all of us.


Or, it is entirely possible that I'm just not running the current 
software properly!  If so, I'd love it if someone who knows better 
(such as their developers) could enlighten me.


-James Holton
MAD Scientist

On 1/12/2013 3:07 AM, Pavol Skubak wrote:


Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

--
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414 tel:0031715274414
web: http://bsc.lic.leidenuniv.nl/people/skubak-0







--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582




Re: [ccp4bb] a challenge

2013-01-12 Thread James Holton


I admit that made impossible more difficult to solve than possible, 
but not in the way I had intended!  Again, sorry about that.  It is 
corrected now.


The change in indexing arises because I am processing the simulated 
images with a default run of XDS and as you know the autoindexing picks 
an indexing convention at random.  I flipped it back at the time, but 
when I just now went back to get the I(+)/I(-) I went just one step too 
far.


Once again, sorry.  It was not my intention to waste anyone's time!

-James Holton
MAD Scientist

On 1/12/2013 2:09 PM, George Sheldrick wrote:

James,

I had in fact just come to the conclusion that the indexing was 
consistent with 3dko for 'possible' but not for 'impossible',

which I suppose was logical.

George

Woops!  sorry folks.  I made a mistake with the I(+)/I(-) entry.  
They had the wrong axis convention relative to 3dko and the F in the 
same file.  Sorry about that.


The files on the website now should be right.
http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz

md5 sums:
c4bdb32a08c884884229e8080228d166  impossible.mtz
caf05437132841b595be1c0dc1151123  possible.mtz

-James Holton
MAD Scientist

On 1/12/2013 8:25 AM, James Holton wrote:


Fair enough!

I have just now added DANO  and I(+)/I(-) to the files. I'll be very 
interested to see what you can come up with! For the record, the 
phases therein came from running mlphare with default parameters but 
exactly the correct heavy-atom constellation (all the sulfur atoms 
in 3dko), and then running dm with default parameters.


Yes, there are other ways to run mlphare and dm that give better 
phases, but I was only able to determine those parameters by 
cheating (comparing the resulting map to the right answer), so I 
don't think it is fair to use those maps.


I have had a few questions about what is cheating and what is not 
cheating.  I don't have a problem with the use of sequence 
information because that actually is something that you 
realistically would know about your protein when you sat down to 
collect data.  The sequence of this molecule is that of 3dko:

http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir

  I also don't have a problem with anyone actually using an 
automation program to _help_ them solve the impossible dataset as 
long as they can explain what they did.  Simply putting the above 
sequence into BALBES would, of course, be cheating!  I suppose one 
could try eliminating 3dko and its homologs from the BALBES 
search, but that, in and of itself, is perhaps relevant to the 
challenge: what is the most distance homolog that still allows you 
to solve the structure?.  That, I think, is also a stringent test 
of model-building skill.


  I have already tried ARP/wARP, phenix.autobuild and 
buccaneer/refmac.  With default parameters, all of these programs 
fail on both the possible and impossible datasets.  It was only 
with some substantial tweaking that I found a way to get 
phenix.autobuild to crack the possible dataset (using 20 models in 
parallel).  I have not yet found a way to get any automation program 
to build its way out of the impossible dataset.   Personally, I 
think that the breakthrough might be something like what Tom 
Terwilliger mentioned.  If you build a good enough starting set of 
atoms, then I think an automation program should be able to take you 
the rest of the way.  If that is the case, then it means people like 
Tom who develop such programs for us might be able to use that 
insight to improve the software, and that is something that will 
benefit all of us.


Or, it is entirely possible that I'm just not running the current 
software properly!  If so, I'd love it if someone who knows better 
(such as their developers) could enlighten me.


-James Holton
MAD Scientist

On 1/12/2013 3:07 AM, Pavol Skubak wrote:


Dear James,

your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.

If you can make the F+/F- and Se substructure available, we
can test whether this is the case indeed. However, while I
expect this would push the challenge further significantly,
most likely you would be able to decrease the Se incorporation
of your simulated data further to such levels that the anomalous
signal is again no longer sufficient to build the structure. And
most likely, there would again exist an edge where a small
decrease in the Se incorporation would lead from a model built
to no model built.

Best regards,

--
Pavol Skubak
Biophysical Structural Chemistry
Gorleaus Laboratories
Einsteinweg 55
Leiden University
LEIDEN  2333CC
the Netherlands
tel: 0031715274414 tel:0031715274414
web: 

Re: [ccp4bb] a challenge

2013-01-12 Thread Pavol Skubak
I can build from the impossible.mtz data in the following two steps:

1. getting the SE substructure from anomalous difference map
constructed from impossible.mtz

2. running combined model building using the substructure
from step 1 and starting from the impossible.mtz map

Only impossible.mtz and the sequence (which is probably not
really necessary) is used in this solution.

It is not a fully automatic solution - step 2 (model building
combined with density modif. and phasing via a recently
developed multivariate SAD function) was performed
automatically using CRANK (which calls Buccaneer, REFMAC
and Parrot), step 1 manually - using CCP4 tools (cfft and
peakmax).

Comparing to the deposited model, 96% of the mainchain is
(correctly) built and 92% is (correctly) docked and R factor
is 21% - clearly, the (relatively) weak anomalous signal is the
only limitation in this case. However, the model building
procedure did not struggle too much - I expect it would still
work if the Se incorporation is decreased somewhat further
(as long as the substructure can be obtained in some way).

Of course, this is not a pure solution in the sense that
I started from impossible.mtz rather than from scratch, ie
from the data only. Obtaining the substructure from scratch
might be more difficult.

Pavol


On Sat, Jan 12, 2013 at 10:50 PM, James Holton jmhol...@lbl.gov wrote:


 Woops!  sorry folks.  I made a mistake with the I(+)/I(-) entry.  They had
 the wrong axis convention relative to 3dko and the F in the same file.
 Sorry about that.

 The files on the website now should be right.
 http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
 http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz

 md5 sums:
 c4bdb32a08c884884229e8080228d166  impossible.mtz
 caf05437132841b595be1c0dc1151123  possible.mtz

 -James Holton
 MAD Scientist


 On 1/12/2013 8:25 AM, James Holton wrote:


 Fair enough!

 I have just now added DANO  and I(+)/I(-) to the files.  I'll be very
 interested to see what you can come up with!  For the record, the phases
 therein came from running mlphare with default parameters but exactly the
 correct heavy-atom constellation (all the sulfur atoms in 3dko), and then
 running dm with default parameters.

 Yes, there are other ways to run mlphare and dm that give better phases,
 but I was only able to determine those parameters by cheating (comparing
 the resulting map to the right answer), so I don't think it is fair to
 use those maps.

 I have had a few questions about what is cheating and what is not
 cheating.  I don't have a problem with the use of sequence information
 because that actually is something that you realistically would know about
 your protein when you sat down to collect data.  The sequence of this
 molecule is that of 3dko:
 http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir

   I also don't have a problem with anyone actually using an automation
 program to _help_ them solve the impossible dataset as long as they can
 explain what they did.  Simply putting the above sequence into BALBES
 would, of course, be cheating!  I suppose one could try eliminating 3dko
 and its homologs from the BALBES search, but that, in and of itself, is
 perhaps relevant to the challenge: what is the most distance homolog that
 still allows you to solve the structure?.  That, I think, is also a
 stringent test of model-building skill.

   I have already tried ARP/wARP, phenix.autobuild and buccaneer/refmac.
 With default parameters, all of these programs fail on both the possible
 and impossible datasets.  It was only with some substantial tweaking that
 I found a way to get phenix.autobuild to crack the possible dataset
 (using 20 models in parallel).  I have not yet found a way to get any
 automation program to build its way out of the impossible dataset.
 Personally, I think that the breakthrough might be something like what Tom
 Terwilliger mentioned.  If you build a good enough starting set of atoms,
 then I think an automation program should be able to take you the rest of
 the way.  If that is the case, then it means people like Tom who develop
 such programs for us might be able to use that insight to improve the
 software, and that is something that will benefit all of us.

 Or, it is entirely possible that I'm just not running the current software
 properly!  If so, I'd love it if someone who knows better (such as their
 developers) could enlighten me.

 -James Holton
 MAD Scientist

 On 1/12/2013 3:07 AM, Pavol Skubak wrote:


  Dear James,

  your challenge in its current form ignores an important source
 of information for model building that is available for your
 simulated data - namely, it does not allow to use anomalous
 phase information in the model building. In difficult cases on
 the edge of success such as this one, this typically makes
 the difference between building and not building.

  If you can make the F+/F- and Se substructure available, we
 can test whether this is the 

Re: [ccp4bb] a challenge

2013-01-11 Thread Terwilliger, Thomas C
Hi James,

As an aside (as your point is looking for a John Henry, not investigating 
automated model-building) I would point out that it is not uncommon at all to 
find cases where a very small difference in starting parameters or starting 
phases leads to a very different final result in automated model-building. I 
suspect that this comes from the discrete nature of model-building: an atom 
goes either here or there and every time you put in something you have branched 
the search...then when this model is used in calculating a map you get a new 
map that depends on the exact branching...so that small starting perturbations 
can become amplified.

As you have found a way to automatically build possible.mtz I would expect 
that some small change in parameters or software would solve the impossible one 
too (not that one could necessarily find this change easily).

All the best,
Tom T

On Jan 11, 2013, at 12:13 PM, James Holton wrote:

 I have a challenge for all those expert model-builders out there: can you 
 beat the machine?
 
 It seems these days that everything is automated, and the only decision left 
 for a crystallographer to make is which automation package to use.  But has 
 crystallography really been solved?  Is looking at maps now no more 
 interesting than playing chess, or any of the other once noble pursuits of 
 human beings that we no longer see as challenging because someone built a 
 machine that can do the job better than any of us?
 
 I think not.  But I need your help to prove it.
 
 Specifically, the phases in this file:
 http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
 when fed with the right set of parameters into the best model building
 package I have available to me actually does converge to the correct
 structure, with nice low R/Rfree.
 However, THIS file:
 http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz
 contains the same amplitudes but very slightly different phases from those in 
 possible.mtz above, and this file invariably leads to abysmal failure of 
 every model-building package I have tried.
 
 Short of cheating (aka using molecular replacement with the right ansswer: 
 3dko), I don't think there is any automated way to arrive at a solved 
 structure from impossible.mtz.  What is interesting about this is how 
 remarkably similar these two maps are. In fact, the correlation coefficient 
 between them is 0.92. And yet, one can be solved automatically, and the other 
 can't.
 
 More details can be found on the web page:
 http://bl831.als.lbl.gov/~jamesh/challenge/
 
 But, my question for the CCP4BB is:
 
 Are there any John Henrys left out there who can still beat the
 machine? Anyone?
 
 -James Holton
 MAD Scientist