I can build from the impossible.mtz data in the following two steps:
1. getting the SE substructure from anomalous difference map
constructed from impossible.mtz
2. running combined model building using the substructure
from step 1 and starting from the impossible.mtz map
Only impossible.mtz and the sequence (which is probably not
really necessary) is used in this solution.
It is not a fully automatic solution - step 2 (model building
combined with density modif. and phasing via a recently
developed multivariate SAD function) was performed
automatically using CRANK (which calls Buccaneer, REFMAC
and Parrot), step 1 manually - using CCP4 tools (cfft and
peakmax).
Comparing to the deposited model, 96% of the mainchain is
(correctly) built and 92% is (correctly) docked and R factor
is 21% - clearly, the (relatively) weak anomalous signal is the
only limitation in this case. However, the model building
procedure did not struggle too much - I expect it would still
work if the Se incorporation is decreased somewhat further
(as long as the substructure can be obtained in some way).
Of course, this is not a pure solution in the sense that
I started from impossible.mtz rather than from scratch, ie
from the data only. Obtaining the substructure from scratch
might be more difficult.
Pavol
On Sat, Jan 12, 2013 at 10:50 PM, James Holton jmhol...@lbl.gov wrote:
Woops! sorry folks. I made a mistake with the I(+)/I(-) entry. They had
the wrong axis convention relative to 3dko and the F in the same file.
Sorry about that.
The files on the website now should be right.
http://bl831.als.lbl.gov/~jamesh/challenge/possible.mtz
http://bl831.als.lbl.gov/~jamesh/challenge/impossible.mtz
md5 sums:
c4bdb32a08c884884229e8080228d166 impossible.mtz
caf05437132841b595be1c0dc1151123 possible.mtz
-James Holton
MAD Scientist
On 1/12/2013 8:25 AM, James Holton wrote:
Fair enough!
I have just now added DANO and I(+)/I(-) to the files. I'll be very
interested to see what you can come up with! For the record, the phases
therein came from running mlphare with default parameters but exactly the
correct heavy-atom constellation (all the sulfur atoms in 3dko), and then
running dm with default parameters.
Yes, there are other ways to run mlphare and dm that give better phases,
but I was only able to determine those parameters by cheating (comparing
the resulting map to the right answer), so I don't think it is fair to
use those maps.
I have had a few questions about what is cheating and what is not
cheating. I don't have a problem with the use of sequence information
because that actually is something that you realistically would know about
your protein when you sat down to collect data. The sequence of this
molecule is that of 3dko:
http://bl831.als.lbl.gov/~jamesh/challenge/seq.pir
I also don't have a problem with anyone actually using an automation
program to _help_ them solve the impossible dataset as long as they can
explain what they did. Simply putting the above sequence into BALBES
would, of course, be cheating! I suppose one could try eliminating 3dko
and its homologs from the BALBES search, but that, in and of itself, is
perhaps relevant to the challenge: what is the most distance homolog that
still allows you to solve the structure?. That, I think, is also a
stringent test of model-building skill.
I have already tried ARP/wARP, phenix.autobuild and buccaneer/refmac.
With default parameters, all of these programs fail on both the possible
and impossible datasets. It was only with some substantial tweaking that
I found a way to get phenix.autobuild to crack the possible dataset
(using 20 models in parallel). I have not yet found a way to get any
automation program to build its way out of the impossible dataset.
Personally, I think that the breakthrough might be something like what Tom
Terwilliger mentioned. If you build a good enough starting set of atoms,
then I think an automation program should be able to take you the rest of
the way. If that is the case, then it means people like Tom who develop
such programs for us might be able to use that insight to improve the
software, and that is something that will benefit all of us.
Or, it is entirely possible that I'm just not running the current software
properly! If so, I'd love it if someone who knows better (such as their
developers) could enlighten me.
-James Holton
MAD Scientist
On 1/12/2013 3:07 AM, Pavol Skubak wrote:
Dear James,
your challenge in its current form ignores an important source
of information for model building that is available for your
simulated data - namely, it does not allow to use anomalous
phase information in the model building. In difficult cases on
the edge of success such as this one, this typically makes
the difference between building and not building.
If you can make the F+/F- and Se substructure available, we
can test whether