Re: [ccp4bb] when does non-isomorphism become a habit?

2020-12-08 Thread Manfred S. Weiss

Dear James,

let's spin the thought a bit further. What if there is a new program
(Phaser-II) some day,
for which the coordinates in your "new crystal form" are all of a sudden
within the
radius of convergence again? Does this bring your "new crystal form"
back to the old
crystal form again?

I'd say it is neither cell dimensions nor space group that define a new
crystal form. It
is the packing of the molecules. It happens quite often in dehydration
experiments
that molecules in a lattice move a bit and symmetry elements are lost or
new symmetry
elements are created. The space group changes, as well as the unit cell
dimensions. But
the packing remains essentially the same. You can usually infer what
happens, but you
still need molecular replacement to actually solve the "new" form.

Best, Manfred


Am 09.12.2020 um 01:56 schrieb James Holton:

I have a semantics question, and I know how much this forum loves
discussing semantics.

We've all experienced non-isomorphism, where two crystals, perhaps
even grown from the same drop, yield different data. Different enough
so that merging them makes your overall data quality worse. I'd say
that is a fairly reasonable definition of non-isomorphism? Most of the
time unit cell changes are telling, but in the end it is the I/sigma
and resolution limit that we care about the most.

Now, of course, even for non-isomorphous data sets you can usually
"solve" the non-isomorphous data without actually doing molecular
replacement.  All you usually need to do is run pointless using the
PDB file from the first crystal as a reference, and it will re-index
the data to match the model.  Then you just do a few cycles of rigid
body and you're off and running.  A nice side-effect of this is that
all your PDB files will line up when you load them into coot.  No
worries about indexing ambiguities, space group assignment, or origin
choice. Phaser is a great program, but you don't have to run it on
everything.

My question is: what about when you DO have to run Phaser to solve
that other crystal from the same drop?  What if the space group is the
same, the unit cell is kinda-sorta the same, but the coordinates have
moved enough so as to be outside the radius of convergence of
rigid-body refinement?  Does that qualify as a different "crystal
form" or different "crystal habit"?  Or is it the same form, and just
really non-isomorphous?

Opinions?

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/


--
Dr. Manfred S. Weiss
Macromolecular Crystallography
Helmholtz-Zentrum Berlin
Albert-Einstein-Str. 15
D-12489 Berlin
Germany




Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Volkmar Dietz, stv. Vorsitzende Dr. Jutta 
Koch-Unterseher
Geschäftsführung: Prof. Dr. Bernd Rech (Sprecher), Prof. Dr. Jan Lüning, Thomas 
Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] when does non-isomorphism become a habit?

2020-12-08 Thread Robert Stroud
Hmmm…interesting James!. I don’t usually make a habit of trying to form a 
class, or losing face because of a point, -or even making a group in space.. 
But..  
Id say it is neither different form nor different habit. -since these are 
centuries old terms -and still valuable, - from mineralogy that reflect (form) 
faces that grow preferentially due to packing of molecules within the crystal,- 
so the crystal may have no different form, or may have. So that term doesn’t 
define... Likewise crystal habit reflects the growth of crystals sort of above 
crystal forms, - and again could be the same habit or be different..  Id say 
-this great example is more like a conformational variant that is 
non-isomorphous with the previous structure, -you got it right the first time 
Id say!.  
I propose that these examples should be termed something different…. a “Holton 
fractal”. 
Robert Stroud
str...@msg.ucsf.edu
415 987 7535



> On Dec 8, 2020, at 4:56 PM, James Holton  wrote:
> 
> I have a semantics question, and I know how much this forum loves discussing 
> semantics.
> 
> We've all experienced non-isomorphism, where two crystals, perhaps even grown 
> from the same drop, yield different data. Different enough so that merging 
> them makes your overall data quality worse. I'd say that is a fairly 
> reasonable definition of non-isomorphism? Most of the time unit cell changes 
> are telling, but in the end it is the I/sigma and resolution limit that we 
> care about the most.
> 
> Now, of course, even for non-isomorphous data sets you can usually "solve" 
> the non-isomorphous data without actually doing molecular replacement.  All 
> you usually need to do is run pointless using the PDB file from the first 
> crystal as a reference, and it will re-index the data to match the model.  
> Then you just do a few cycles of rigid body and you're off and running.  A 
> nice side-effect of this is that all your PDB files will line up when you 
> load them into coot.  No worries about indexing ambiguities, space group 
> assignment, or origin choice. Phaser is a great program, but you don't have 
> to run it on everything.
> 
> My question is: what about when you DO have to run Phaser to solve that other 
> crystal from the same drop?  What if the space group is the same, the unit 
> cell is kinda-sorta the same, but the coordinates have moved enough so as to 
> be outside the radius of convergence of rigid-body refinement?  Does that 
> qualify as a different "crystal form" or different "crystal habit"?  Or is it 
> the same form, and just really non-isomorphous?
> 
> Opinions?
> 
> -James Holton
> MAD Scientist
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] when does non-isomorphism become a habit?

2020-12-08 Thread James Holton
I have a semantics question, and I know how much this forum loves 
discussing semantics.


We've all experienced non-isomorphism, where two crystals, perhaps even 
grown from the same drop, yield different data. Different enough so that 
merging them makes your overall data quality worse. I'd say that is a 
fairly reasonable definition of non-isomorphism? Most of the time unit 
cell changes are telling, but in the end it is the I/sigma and 
resolution limit that we care about the most.


Now, of course, even for non-isomorphous data sets you can usually 
"solve" the non-isomorphous data without actually doing molecular 
replacement.  All you usually need to do is run pointless using the PDB 
file from the first crystal as a reference, and it will re-index the 
data to match the model.  Then you just do a few cycles of rigid body 
and you're off and running.  A nice side-effect of this is that all your 
PDB files will line up when you load them into coot.  No worries about 
indexing ambiguities, space group assignment, or origin choice. Phaser 
is a great program, but you don't have to run it on everything.


My question is: what about when you DO have to run Phaser to solve that 
other crystal from the same drop?  What if the space group is the same, 
the unit cell is kinda-sorta the same, but the coordinates have moved 
enough so as to be outside the radius of convergence of rigid-body 
refinement?  Does that qualify as a different "crystal form" or 
different "crystal habit"?  Or is it the same form, and just really 
non-isomorphous?


Opinions?

-James Holton
MAD Scientist



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] MicroED workshop 2020

2020-12-08 Thread Johan Hattne

Dear all;

We are pleased to announce the 6th MicroED workshop to be held on 
December 21, 2020. This will be a free virtual event.  Please register 
at the following website to receive your link


  https://cryoem.ucla.edu/course

Topics covered will include all aspects of microcrystal electron 
diffraction (MicroED) of diverse samples such as soluble proteins, 
membrane proteins, small molecules, natural products, and other materials.


We will cover

  1. Sample preparation methodologies
  2. EPU-D for data collection
  3. Camera systems and performance
  4. FIB-milling procedures for soluble and membrane protein samples
  5. Data analysis software, refinement, and structure solution

The schedule is available on the registration page.

// Sincerely; on behalf of the organizing committee (Tamir Gonen, 
Brandon Mercado, Brent Nannenga)




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] seek your opion on this weird diffractio pattern

2020-12-08 Thread Kevin Jin
Hi Joseph,

It is a very beautiful crystal with neat diffraction patterns and
background, which may suggest a good cryoprotection. Here is my observation
from your images.

1. From your image taken at 30 degrees (first page), I believe it is a
protein crystal with a larger unit cell length on one dimension, rather
than an inorganic crystal. By measuring the distance between two spots in
line, you may be able to estimate the length of the unit cell two
dimensions.

2. According to the first crystal image, there is a line in the middle of
the crystal. If I am correct, it could be that two crystals grew  and
merged together. I used to have some cases like this. In my case, one half
of the crystal showed a hexahedron shape, and another half with a nearly
cubic shape. Eventually, the structure was determined in both P3 and P222
with refined conditions, respectively.  If this is true in your case, you
may have a multi crystal growth.

3. According to the 90 degree angle image on your second page, the
diffraction spots at right-bottom corner (4:30 O'Clock), those three spots
could be a protein crystal with a shorter length. There could be two
possibilities, very anisotropic growth or taking a longer exposures.

4. According to the outlooking of you crystal, it is very possible to carry
P3 or P6 symmetry. If this is true, it won't be an inorganic crystal.

5. If you still have crystals from the same drop, you may cut a piece from
the right side and take a long shoot. You may have a chance to index it.

Best,

Kevin

On Tue, Dec 8, 2020 at 7:47 AM Joseph Ho  wrote:

> Dear all:
> We recently worked on 8kDa protein crystal structure. We obtained
> crystals in Zinc acetate and PEG8000 condition. However, we observed
> this unusual diffraction patterns. I am wondering if anyone observed
> this and know how this can occur. The cryoprotectant is glycerol.
>
> Thank you for you help
>
> Joseph
>
> 
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
> available at https://www.jiscmail.ac.uk/policyandsecurity/
>


-- 
Kevin Jin Ph.D

Sharing knowledge  is always joyful..A flash scientist in biology.

Website: http://www.jinkai.org/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-08 Thread Jasmine Young

Dear PDB Data Users:

Thank you for providing feedback on the results of an archival-level 
carbohydrate remediation project that led to the re-release of over 
14,000 PDB structures in July 2020. This update includes diverse 
oligosaccharides: glycosylation; metabolites such as maltose, sucrose, 
cellulose fragments; glycosaminoglycans, such as fragments of heparin 
and heparan sulfate; epitope patterns such as A/B blood group antigens 
and the H-type or Lewis-type stems; and many artificial carbohydrates 
mimicking or counting natural products 
(https://www.wwpdb.org/documentation/carbohydrate-remediation).


Starting in 2017, this PDB remediation aimed to standardize the 
biochemical nomenclature of the carbohydrate components following the 
IUPAC-IUBMB recommendations established by the carbohydrate community 
(https://media.iupac.org/publications/pac/1996/pdf/6810x1919.pdf 
), and 
to provide uniform representation of oligosaccharides to improve the 
identification and searchability of oligosaccharides modeled in the PDB 
structures.  During the remediation planning, wwPDB consulted community 
users and the PDBx/mmCIF Working Group and made data files available on 
GitHub in early 2020 for community feedback. wwPDB has collaborated with 
Robert Woods at University of Georgia in US, researchers at The Noguchi 
Institute and Soka University in Japan, and Thomas Lutteke in Germany to 
generate uniform linear descriptors for the oligosaccharide sequences.


To achieve these community goals, each oligosaccharide is represented as 
a branched entity with complete biochemical description and each 
glycosidic linkage specified. The full representation of carbohydrates 
is provided in the mmCIF format file, but this is not possible in legacy 
PDB format files (as the format has been frozen since 2012 
(https://www.wwpdb.org/documentation/file-formats-and-the-pdb 
).


Proper indexing is necessary for branched entity representation and for 
generation of linear descriptors, hence the ordering (numbering) starts 
at the reducing end (#1), where the glycosylation occurs, to the 
non-reducing end in ascending order. Unique chain IDs are assigned to 
branched entities (oligosaccharides) to avoid residue numbering 
overlapped with protein residues and to enable consistent numbering for 
every oligosaccharide. For example, in PDB ID 6WPS, there are 5 
oligosaccharides associated with the same protein chain A, the 
consistent ordering and numbering can only be retained with unique chain 
ID for each oligosaccharide in both PDBx/mmCIF and PDB format files


For archival consistency, a single-monosaccharide is defined as a 
non-polymer and treated consistently with other non-polymer ligands in 
the PDB. A single-monosaccharide occurring at a glycosylation site has a 
unique chain ID in the PDBx/mmCIF file (_atom_site.label_asym_id) but 
not in the PDB format file.


Using PDB ID 6WPS as an example, the PDBx/mmCIF data item 
_atom_site.label_asym_id corresponds to the column #7 in the atom_site 
coordinates section has an asym ID ‘Y’ for the 1st instance of 
single-monosaccharide, NAG bound to ASN 61 of protein chain ‘A’. The ‘Y’ 
value is unique for this monosaccharide. The additional chain ID 
(_atom_site.auth_asym_id) in the PDBx/mmCIF file that mapped to the PDB 
format file for this NAG is chain ‘A’, which is consistently represented 
as any other non-polymer ligands associated with the protein chain A.


#

loop_

_atom_site.group_PDB

_atom_site.id

_atom_site.type_symbol

_atom_site.label_atom_id

_atom_site.label_alt_id

_atom_site.label_comp_id

*_atom_site.label_asym_id*

_atom_site.label_entity_id

_atom_site.label_seq_id

_atom_site.pdbx_PDB_ins_code

_atom_site.Cartn_x

_atom_site.Cartn_y

_atom_site.Cartn_z

_atom_site.occupancy

_atom_site.B_iso_or_equiv

_atom_site.pdbx_formal_charge

_atom_site.auth_seq_id

_atom_site.auth_comp_id

*_atom_site.auth_asym_id*

_atom_site.auth_atom_id

_atom_site.pdbx_PDB_model_num

...

HETATM 27655 C C1 . NAG *Y*  6 .    ? 191.103 162.375 206.665 1.00 
47.28  ? 1301 NAG *A* C1  1


HETATM 27656 C C2 . NAG Y  6 .    ? 191.067 161.665 208.065 1.00 47.22  
? 1301 NAG A C2  1


HETATM 27657 C C3 . NAG Y  6 .    ? 190.138 160.434 207.960 1.00 47.42  
? 1301 NAG A C3  1


HETATM 27658 C C4 . NAG Y  6 .    ? 188.730 160.906 207.541 1.00 48.73  
? 1301 NAG A C4  1


HETATM 27659 C C5 . NAG Y  6 .    ? 188.838 161.622 206.176 1.00 48.66  
? 1301 NAG A C5  1


HETATM 27660 C C6 . NAG Y  6 .    ? 187.494 162.153 205.709 1.00 48.17  
? 1301 NAG A C6  1


HETATM 27661 C C7 . NAG Y  6 .    ? 193.233 161.885 209.217 1.00 47.40  
? 1301 NAG A C7  1


HETATM 27662 C C8 . NAG Y  6 .    ? 194.594 161.311 209.471 1.00 47.45  
? 1301 NAG A C8  1


HETATM 27663 N N2 . NAG Y  6 .    ? 192.418 161.218 208.414 1.00 47.36  
? 1301 NAG A N2  1


HETATM 27664 O O3 . NAG Y 

Re: [ccp4bb] seek your opion on this weird diffractio pattern

2020-12-08 Thread Robert Stroud
The first is a very interesting pattern! - It would be good to know more about 
exactly how the patterns were taken, camera length and dimensions to determine 
the spacings. - The first patterns look very strongly like  Bessel functions of 
 a helical diffraction pattern that is then sampled by a crystal lattice. The 
helices of your 8kDa protein would have made very well ordered crystals, with 
the axis of the helices almost parallel to the horizontal (mounting pin) in the 
30 deg pattern. It looks as if one of the crystal axes is exactly along the 
helix axis. The other 180 deg and 90 deg patterns have the axis of the helix of 
8kDa subunits inclined at some calculable angle.  You should be able to deduce 
the helical pitch and radius, and the crystal repeats hence the entire packing 
in the crystal, - and the structure of the 8kDa subunits.

The second patterns look like salt- probably some form hydrated zinc acetate.  
Cell dimensions should tell you what it is exactly.
 
Many thanks for interesting puzzle.  I’d love to see the result!
Robert Stroud
str...@msg.ucsf.edu
415 987 7535



> On Dec 8, 2020, at 7:46 AM, Joseph Ho  wrote:
> 
> Dear all:
> We recently worked on 8kDa protein crystal structure. We obtained
> crystals in Zinc acetate and PEG8000 condition. However, we observed
> this unusual diffraction patterns. I am wondering if anyone observed
> this and know how this can occur. The cryoprotectant is glycerol.
> 
> Thank you for you help
> 
> Joseph
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] seek your opion on this weird diffractio pattern

2020-12-08 Thread Nave, Colin (DLSLtd,RAL,LSCI)
Hi Joseph
Great diffraction patterns. As Jon has just said, it could be zinc acetate. 
There are some very strong spots which would be consistent with a single 
crystal of this. However, something else seems to be present.
Could it instead be PEG crystals with some superlattice repeat? For unit cell 
dimensions of PEG see
https://pubs.acs.org/doi/abs/10.1021/ma60035a005
Some circles giving Bragg spacings would help. Also the rotation range. You 
should at least be able to estimate some of the cell dimensions and see if they 
are consistent with PEG, zinc acetate or some alternative. 
I seem to recall the subject of PEG crystals coming up before.
Colin

-Original Message-
From: CCP4 bulletin board  On Behalf Of Joseph Ho
Sent: 08 December 2020 15:47
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] seek your opion on this weird diffractio pattern

Dear all:
We recently worked on 8kDa protein crystal structure. We obtained crystals in 
Zinc acetate and PEG8000 condition. However, we observed this unusual 
diffraction patterns. I am wondering if anyone observed this and know how this 
can occur. The cryoprotectant is glycerol.

Thank you for you help

Joseph



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

-- 
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] seek your opion on this weird diffractio pattern

2020-12-08 Thread Jon Cooper
Hello, the first of those diffraction patterns is very beautiful. What was the 
concentration of zinc acetate? Also, did you keep the crystal? It might be 
worth prodding it with a needle and if holds together well or if you hear a 
loud crack it's probably salt. The second image shows a salt-like pattern. I 
think those lunes of spots in the first image are some sort of disorder in the 
crystal, which I fear is zinc acetate. Best wishes, Jon Cooper.

Sent from ProtonMail mobile

 Original Message 
On 8 Dec 2020, 15:46, Joseph Ho wrote:

> Dear all:
> We recently worked on 8kDa protein crystal structure. We obtained
> crystals in Zinc acetate and PEG8000 condition. However, we observed
> this unusual diffraction patterns. I am wondering if anyone observed
> this and know how this can occur. The cryoprotectant is glycerol.
>
> Thank you for you help
>
> Joseph
>
> 
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Tristan Croll
... and of course I meant "between model and target".

From: Tristan Croll 
Sent: 08 December 2020 16:35
To: CCP4BB@JISCMAIL.AC.UK ; Marko Hyvonen 

Subject: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less 
pipetting (?)

An example: this is TS1038-D1 - designated by the CASP organisers as in the 
"free modelling" category due to the absence of any close homologues in the 
wwPDB. The experimental model is in tan, the AlphaFold2 prediction in cyan. As 
far as I'm concerned, the only way to describe this is "nailed it". Using 
ChimeraX's MatchMaker to do the alignment, 84 of 114 residues align to a 
CA-RMSD of 0.57 A, (2.3 A across all residues, with the outliers being one 
flexible-looking loop and the N-terminal tail). Further than that, it's nailed 
almost all the details - if you exclude surface-exposed residues, I count less 
than half a dozen sidechains with significantly different rotamers compared to 
the template. The upshot is that the difference between model and template 
appears easily within the range of variation you'd expect to see between 
different crystal forms of the same protein.

For comparison, the next best group got the three-strand beta-sheet at bottom 
right essentially correct, but everything else (apart from the vague fold) 
wrong. MatchMaker aligns 28 CA atoms with an RMSD of 0.64 A, but the overall 
CA-RMSD blows out to 9.6 A. So I don't think there's any denying that this is a 
spectacular advance that will change the field markedly.

Best regards,

Tristan



From: CCP4 bulletin board  on behalf of Marko Hyvonen 

Sent: 08 December 2020 15:07
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less 
pipetting (?)

Hi Ian,

The data on Alphafold2 target RMSDs seems to be correct, but that "resolution 
around 2.5Å", makes no sense, I agree  - had not noticed that before. I can see 
that this has been raised in the Twitter feed comments to his post too.

I was highlighting this more for the alternative viewpoint on the discussion 
and also on the interesting detail on the resources needed/available (assuming 
correct!).

Marko

On 08/12/2020 14:02, Ian Tickle wrote:

Hi Marko

I hope he hasn't confused resolution with RMSD error:

"Just keep in mind that (1) a lower RMSD represents a better predicted 
structure, and that (2) most experimental structures have a resolution around 
2.5 Å. Taking this into consideration, about a third (36%) of Group 427’s 
submitted targets were predicted with a root-mean-square deviation (RMSD) under 
2 Å, and 86% were under 5 Å, with a total mean of 3.8 Å."

Cheers

-- Ian



On Tue, 8 Dec 2020 at 13:51, Marko Hyvonen 
mailto:mh...@cam.ac.uk>> wrote:
Here is another take on this topic, by Carlos Quteiral (@c_outeiral), from a 
non-crystallographer's point of view, covering many of the points discussed in 
this thread  (incl. an example of the model guiding correction of the 
experimental structure).

https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/

Marko

On 08/12/2020 13:25, Tristan Croll wrote:
This is a number that needs to be interpreted with some care. 2 Å crystal 
structures in general achieve an RMSD of 0.2 Å on the portion of the crystal 
that's resolved, including loops that are often only in well-resolved 
conformations due to physiologically-irrelevant crystal packing interactions. 
The predicted models, on the other hand, are in isolation. Once you get to the 
level achieved by this last round of predictions, that starts making fair 
comparison somewhat more difficult*. Two obvious options that I see: (1) limit 
the comparison only to the stable core of the protein (in which case many of 
the predictions have RMSDs in the very low fractions of an Angstrom), or (2) 
compare ensembles derived from MD simulations starting from the experimental 
and predicted structure, and see how well they overlap.

-- Tristan

* There's one more thorny issue when you get to this level: it becomes more and 
more possible (even likely) that the prediction gets some things right that are 
wrong in the experimental structure.

From: CCP4 bulletin board  
on behalf of Ian Tickle 
Sent: 08 December 2020 13:04
To: CCP4BB@JISCMAIL.AC.UK 

Subject: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less 
pipetting (?)


There was a little bit of press-release hype: the release stated "a score of 
around 90 GDT is informally considered to be competitive with results obtained 
from experimental methods" and "our latest AlphaFold system achieves a median 
score of 92.4 GDT overall across all targets. This means that our predictions 
have an average error 

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Marko Hyvonen

  
  
Hi Ian, 
  
  The data on Alphafold2 target RMSDs seems to be correct, but that
  "resolution around 2.5Å", makes no sense, I agree  - had not
  noticed that before. I can see that this has been raised in the
  Twitter feed comments to his post too.   
  
  I was highlighting this more for the alternative viewpoint on the
  discussion and also on the interesting detail on the resources
  needed/available (assuming correct!).
  
  Marko

On 08/12/2020 14:02, Ian Tickle wrote:


  
  

  Hi Marko
  
  
  I hope he hasn't confused resolution with RMSD error:
  
  
  "Just
  keep in mind that (1) a lower RMSD represents a better
  predicted structure, and that (2) most experimental
  structures have a resolution around 2.5 Å. Taking this
  into consideration, about a third (36%) of Group 427’s
  submitted targets were predicted with a root-mean-square
  deviation (RMSD) under 2 Å, and 86% were under 5 Å, with a
  total mean of 3.8 Å."
  
  

  Cheers
  

  --
  Ian
  

  


  
  
  
On Tue, 8 Dec 2020 at 13:51,
  Marko Hyvonen  wrote:


   Here is another take on this
  topic, by Carlos Quteiral (@c_outeiral), from a
  non-crystallographer's point of view, covering many of the
  points discussed in this
  thread  (incl. an example
  of the model guiding correction of the experimental
  structure).
  
  https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/
  
  Marko

On 08/12/2020 13:25, Tristan Croll wrote:


  
This is a number that needs to be interpreted with some
care. 2 Å crystal structures in general achieve an RMSD
of 0.2 Å on the portion of the crystal that's resolved,
including loops that are often only in well-resolved
conformations due to physiologically-irrelevant crystal
packing interactions. The predicted models, on the other
hand, are in isolation. Once you get to the level
achieved by this last round of predictions, that starts
making fair comparison somewhat more difficult*. Two
obvious options that I see: (1) limit the comparison
only to the stable core of the protein (in which case
many of the predictions have RMSDs in the very low
fractions of an Angstrom), or (2) compare ensembles
derived from MD simulations starting from the
experimental and predicted structure, and see how well
they overlap.
  

  
  -- Tristan
  
  
  * There's one more thorny issue
when you get to this level: it becomes more and more
possible (even likely) that the prediction gets some
things right that are wrong in the experimental
structure. 
  
  From: CCP4 bulletin
  board 
  on behalf of Ian Tickle 
  Sent: 08 December 2020 13:04
  To: CCP4BB@JISCMAIL.AC.UK
  
  Subject: Re: [ccp4bb] External: Re: [ccp4bb]
  AlphaFold: more thinking and less pipetting (?)
 
  
  

  

  
There was a little bit of press-release
  hype: the release stated "a score of
around 90 GDT is informally considered to be
competitive with results obtained from
experimental methods" and "our
latest AlphaFold system achieves a median
score of 92.4 GDT overall across all
targets. This means that our predictions
have an average error (RMSD) of
approximately 1.6 Angstroms,".

  
Experimental
  methods achieve an average error of around
  0.2 Ang. or better at 2 Ang. resolution,
   

Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Bryan Lepore
Greetings — I am interested to know more about the following points to 
understand the results : 

[1] How was the “C-alpha-IDDT” (Mariani et. al., Bioinformatics, 29(21), 
2722-2728, 2013) used, as - if I understand, the unprecedented and exceptional 
prediction capabilities of AlphaFold2 - as compared with the second place 
team’s program (which program was this?) - rests entirely on how this score is 
obtained — or, how is the GDT score related to the C-alpha-IDDT. (Is the GDT 
score calculation published?)

Is the C-alpha-IDDT score really based solely on C-alpha?

Does CASP from 2006 through the current CASP (CASP14?) use the same score, and 
is it the C-alpha-IDDT? Because the C-alpha-IDDT score was published in 2013, 
and the top scores - as seen in the histogram in Nature News are roughly flat 
until 2014, after which the top score appears to increase linearly.

Could the programs use this score to select solutions, or given that the 
programs can be using difference distance matrices anyway, isn’t it expected to 
produce a higher score than the e.g. RMSD on the peptide backbone?

[2] How exactly are the “poor” predictions of structures determined by NMR to 
be explained, given the excellent predictions otherwise, and how does the 
overall score in the Nature News histogram account for this discrepancy?

[3] how did the training data set increase in size, particularly from 2014? 
More cryo-EM structures?

[4] what “additional information about the physical and geometric constraints 
that determine how a protein folds” was used in AlphaFold2, when was it 
discovered, did other teams use this “additional information”, and is this the 
first year it was used? 

Thanks,

-Bryan

References : 
Nature News column histogram:
https://media.nature.com/lw800/magazine-assets/d41586-020-03348-4/d41586-020-03348-4_18633154.jpg

CASP book of abstracts:
https://predictioncenter.org/casp14/doc/CASP14_Abstracts.pdf (year 2020, I 
assume)


To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Emmanuel Saridakis
Dear John,
Your article touches all the important points about this breakthrough and its 
caveats.
I would just like to add that the ligand problem is of a different order: it is 
fundamentally not about whether, where and how a ligand is predicted to bind, 
but rather about whether it indeed binds where and in the way it is predicted 
to. So I daresay that it is an irreducibly experimental problem.
Best,
Emmanuel

Dr Emmanuel Saridakis
National Centre for Scientific Research DEMOKRITOS
Athens, Greece
- Original Message -
From: John R Helliwell 
To: CCP4BB@JISCMAIL.AC.UK
Sent: Tue, 08 Dec 2020 15:15:14 +0200 (EET)
Subject: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

Dear Isabel,
My article in the IUCr Newsletter on DeepMind and CASP14 is released today and 
can be found here:-
https://www.iucr.org/news/newsletter/volume-28/number-4/deepmind-and-casp14
Best wishes,
John 
Emeritus Professor John R Helliwell DSc




> On 3 Dec 2020, at 11:17, Isabel Garcia-Saez  wrote:
> 
> 
> Dear all,
> 
> Just commenting that after the stunning performance of AlphaFold that uses AI 
> from Google maybe some of us we could dedicate ourselves to the noble art of 
> gardening, baking, doing Chinese Calligraphy, enjoying the clouds pass or 
> everything together (just in case I have already prepared my subscription to 
> Netflix).
> 
> https://www.nature.com/articles/d41586-020-03348-4
> 
> Well, I suppose that we still have the structures of complexes (at the 
> moment). I am wondering how the labs will have access to this technology in 
> the future (would it be for free coming from the company DeepMind - Google?). 
> It seems that they have already published some code. Well, exciting times. 
> 
> Cheers,
> 
> Isabel
> 
> 
> Isabel Garcia-SaezPhD
> Institut de Biologie Structurale
> Viral Infection and Cancer Group (VIC)-Cell Division Team
> 71, Avenue des Martyrs
> CS 10090
> 38044 Grenoble Cedex 9
> France
> Tel.: 00 33 (0) 457 42 86 15
> e-mail: isabel.gar...@ibs.fr
> FAX: 00 33 (0) 476 50 18 90
> http://www.ibs.fr/
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

-- 
Dr. Emmanuel Saridakis
Principal Researcher
Institute of Nanoscience and Nanotechnology
National Centre for Scientific Research "DEMOKRITOS"
15310 Athens
GREECE

tel: +30-2106503793
email: e.sarida...@inn.demokritos.gr



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Ian Tickle
Hi Marko

I hope he hasn't confused resolution with RMSD error:

"Just keep in mind that (1) a lower RMSD represents a better predicted
structure, and that (2) most experimental structures have a resolution
around 2.5 Å. Taking this into consideration, about a third (36%) of Group
427’s submitted targets were predicted with a root-mean-square deviation
(RMSD) under 2 Å, and 86% were under 5 Å, with a total mean of 3.8 Å."

Cheers

-- Ian



On Tue, 8 Dec 2020 at 13:51, Marko Hyvonen  wrote:

> Here is another take on this topic, by Carlos Quteiral (@c_outeiral), from
> a non-crystallographer's point of view, covering many of the points discussed
> in this thread  (incl. an example of the model guiding correction of the
> experimental structure).
>
>
> https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/
>
> Marko
>
> On 08/12/2020 13:25, Tristan Croll wrote:
>
> This is a number that needs to be interpreted with some care. 2 Å crystal
> structures in general achieve an RMSD of 0.2 Å on the portion of the
> crystal that's resolved, including loops that are often only in
> well-resolved conformations due to physiologically-irrelevant crystal
> packing interactions. The predicted models, on the other hand, are in
> isolation. Once you get to the level achieved by this last round of
> predictions, that starts making fair comparison somewhat more difficult*.
> Two obvious options that I see: (1) limit the comparison only to the stable
> core of the protein (in which case many of the predictions have RMSDs in
> the very low fractions of an Angstrom), or (2) compare ensembles derived
> from MD simulations starting from the experimental and predicted structure,
> and see how well they overlap.
>
> -- Tristan
>
> * There's one more thorny issue when you get to this level: it becomes
> more and more possible (even likely) that the prediction gets some things
> right that are wrong in the experimental structure.
> --
> *From:* CCP4 bulletin board 
>  on behalf of Ian Tickle 
> 
> *Sent:* 08 December 2020 13:04
> *To:* CCP4BB@JISCMAIL.AC.UK 
> 
> *Subject:* Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking
> and less pipetting (?)
>
>
> There was a little bit of press-release hype: the release stated "a score
> of around 90 GDT is informally considered to be competitive with results
> obtained from experimental methods" and "our latest AlphaFold system
> achieves a median score of 92.4 GDT overall across all targets. This means
> that our predictions have an average error (RMSD
> )
> of approximately 1.6 Angstroms ,".
>
> Experimental methods achieve an average error of around 0.2 Ang. or better
> at 2 Ang. resolution, and of course much better at atomic resolution (1
> Ang. or better), or around 0.5 Ang. at 3 Ang. resolution.  For
> ligand-binding studies I would say you need 3 Ang. resolution or better.
> 1.6 Ang. error is probably equivalent to around 4 Ang. resolution.  No
> doubt that will improve with time and experience, though I think it will be
> an uphill struggle to get better than 1 Ang. error, simply because the
> method can't be better than the data that go into it and 1-1.5 Ang.
> represents a typical spread of homologous models in the PDB.  So yes very
> competitive if you're desperate for a MR starting model, but not quite yet
> there for a refined high-resolution structure.
>
> Cheers
>
> -- Ian
>
>
> On Tue, 8 Dec 2020 at 12:11, Harry Powell - CCP4BB <
> 193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:
>
> Hi
>
> It’s a bit more than science by press release - they took part in CASP14
> where they were given sequences but no other experimental data, and did
> significantly better than the other homology modellers (who had access to
> the same data) when judge by independent analysis. There were things wrong
> with their structures, sure, but in almost every case they were less wrong
> than the other modellers (many of whom have been working on this problem
> for decades).
>
> It _will_ be more impressive once the methods they used (or equivalents)
> are implemented by other groups and are available to the “public” (I
> haven’t found an AlphaFold webserver to submit a sequence to, whereas the
> other groups in the field do make their methods readily available), but
> it’s still a step-change in protein structure prediction - it shows it can
> be done pretty well.
>
> Michel is right, of course; you can’t have homology modelling without
> homologous models, which are drawn from the PDB - but the other modellers
> had the same access to the PDB (just as we all do…).
>
> Just my two ha’porth.
>
> Harry
>
> > On 8 Dec 2020, at 11:33, Goldman, Adrian 
> wrote:
> >
> > My impression is that they haven’t published the code, and it is science
> 

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Marko Hyvonen

  
  
Here is another take on this topic, by Carlos
  Quteiral (@c_outeiral), from a non-crystallographer's point of
  view, covering many of the points discussed
  in this thread  (incl. an example
  of the model guiding correction of the experimental structure).
  
https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/
  
  Marko

On 08/12/2020 13:25, Tristan Croll
  wrote:


  
  
  
This is a number that needs to be interpreted with some care. 2
Å crystal structures in general achieve an RMSD of 0.2 Å on the
portion of the crystal that's resolved, including loops that are
often only in well-resolved conformations due to
physiologically-irrelevant crystal packing interactions. The
predicted models, on the other hand, are in isolation. Once you
get to the level achieved by this last round of predictions,
that starts making fair comparison somewhat more difficult*. Two
obvious options that I see: (1) limit the comparison only to the
stable core of the protein (in which case many of the
predictions have RMSDs in the very low fractions of an
Angstrom), or (2) compare ensembles derived from MD simulations
starting from the experimental and predicted structure, and see
how well they overlap.
  

  
  --
Tristan
  
  
  *
There's one more thorny issue when you get to this level: it
becomes more and more possible (even likely) that the
prediction gets some things right that are wrong in the
experimental structure. 
  
  From: CCP4
  bulletin board  on behalf of Ian
  Tickle 
  Sent: 08 December 2020 13:04
  To: CCP4BB@JISCMAIL.AC.UK 
  Subject: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold:
  more thinking and less pipetting (?)
 
  
  

  

  
There was a little bit of press-release hype: the
  release stated "a score of around 90 GDT is
informally considered to be competitive with results
obtained from experimental methods" and "our latest AlphaFold system achieves
a median score of 92.4 GDT overall across all
targets. This means that our predictions have an
average error (RMSD) of approximately 1.6 Angstroms,".

  
Experimental methods achieve an
  average error of around 0.2 Ang. or better at 2
  Ang. resolution, and of course much better at
  atomic resolution (1 Ang. or better), or around
  0.5 Ang. at 3 Ang. resolution.  For
  ligand-binding studies I would say you need 3 Ang.
  resolution or better.  1.6 Ang.
error is probably equivalent to around 4 Ang.
resolution.  No doubt that will improve with time
and experience, though I think it will be an uphill
struggle to get better than 1 Ang. error, simply
because the method can't be better than the data
that go into it and 1-1.5 Ang. represents a typical
spread of homologous models in the PDB.  So yes very
competitive if you're desperate for a MR starting
model, but not quite yet there for a refined
high-resolution structure.

  
Cheers

  
-- Ian

  
  

  



  On Tue, 8 Dec 2020 at
12:11, Harry Powell - CCP4BB <193323b1e616-dmarc-requ...@jiscmail.ac.uk>
wrote:
  
  
Hi

It’s a bit more than science by press release - they took
part in CASP14 where they were given sequences but no other
experimental data, and did significantly better than the
other homology modellers (who had access to the same data)
when judge by independent analysis. There were things wrong
with their structures, sure, but in almost every case they
were less wrong than the other modellers (many of whom have
been working on this problem for decades).

It _will_ be more impressive once the methods they used (or
equivalents) are implemented by other groups and are
available to the “public” (I haven’t found an AlphaFold
   

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Ian Tickle
Hi Tristan,

Point taken: unobserved parts of the structure have a very large (if not
undefined) experimental error!

I'd be interested to see how that average 1.6 Ang. error is distributed in
space: presumably that data is in the CASP analysis somewhere.

Cheers

-- Ian


On Tue, 8 Dec 2020 at 13:25, Tristan Croll  wrote:

> This is a number that needs to be interpreted with some care. 2 Å crystal
> structures in general achieve an RMSD of 0.2 Å on the portion of the
> crystal that's resolved, including loops that are often only in
> well-resolved conformations due to physiologically-irrelevant crystal
> packing interactions. The predicted models, on the other hand, are in
> isolation. Once you get to the level achieved by this last round of
> predictions, that starts making fair comparison somewhat more difficult*.
> Two obvious options that I see: (1) limit the comparison only to the stable
> core of the protein (in which case many of the predictions have RMSDs in
> the very low fractions of an Angstrom), or (2) compare ensembles derived
> from MD simulations starting from the experimental and predicted structure,
> and see how well they overlap.
>
> -- Tristan
>
> * There's one more thorny issue when you get to this level: it becomes
> more and more possible (even likely) that the prediction gets some things
> right that are wrong in the experimental structure.
> --
> *From:* CCP4 bulletin board  on behalf of Ian
> Tickle 
> *Sent:* 08 December 2020 13:04
> *To:* CCP4BB@JISCMAIL.AC.UK 
> *Subject:* Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking
> and less pipetting (?)
>
>
> There was a little bit of press-release hype: the release stated "a score
> of around 90 GDT is informally considered to be competitive with results
> obtained from experimental methods" and "our latest AlphaFold system
> achieves a median score of 92.4 GDT overall across all targets. This means
> that our predictions have an average error (RMSD
> )
> of approximately 1.6 Angstroms ,".
>
> Experimental methods achieve an average error of around 0.2 Ang. or better
> at 2 Ang. resolution, and of course much better at atomic resolution (1
> Ang. or better), or around 0.5 Ang. at 3 Ang. resolution.  For
> ligand-binding studies I would say you need 3 Ang. resolution or better.
> 1.6 Ang. error is probably equivalent to around 4 Ang. resolution.  No
> doubt that will improve with time and experience, though I think it will be
> an uphill struggle to get better than 1 Ang. error, simply because the
> method can't be better than the data that go into it and 1-1.5 Ang.
> represents a typical spread of homologous models in the PDB.  So yes very
> competitive if you're desperate for a MR starting model, but not quite yet
> there for a refined high-resolution structure.
>
> Cheers
>
> -- Ian
>
>
> On Tue, 8 Dec 2020 at 12:11, Harry Powell - CCP4BB <
> 193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:
>
> Hi
>
> It’s a bit more than science by press release - they took part in CASP14
> where they were given sequences but no other experimental data, and did
> significantly better than the other homology modellers (who had access to
> the same data) when judge by independent analysis. There were things wrong
> with their structures, sure, but in almost every case they were less wrong
> than the other modellers (many of whom have been working on this problem
> for decades).
>
> It _will_ be more impressive once the methods they used (or equivalents)
> are implemented by other groups and are available to the “public” (I
> haven’t found an AlphaFold webserver to submit a sequence to, whereas the
> other groups in the field do make their methods readily available), but
> it’s still a step-change in protein structure prediction - it shows it can
> be done pretty well.
>
> Michel is right, of course; you can’t have homology modelling without
> homologous models, which are drawn from the PDB - but the other modellers
> had the same access to the PDB (just as we all do…).
>
> Just my two ha’porth.
>
> Harry
>
> > On 8 Dec 2020, at 11:33, Goldman, Adrian 
> wrote:
> >
> > My impression is that they haven’t published the code, and it is science
> by press-release.  If one of us tried it, we would - rightly - get hounded
> out of time.
> >
> > Adrian
> >
> >
> >
> >> On 4 Dec 2020, at 15:57, Michel Fodje 
> wrote:
> >>
> >> I think the results from AlphaFold2, although exciting and a
> breakthrough are being exaggerated just a bit.  We know that all the
> information required for the 3D structure is in the sequence. The protein
> folding problem is simply how to go from a sequence to the 3D structure.
> This is not a complex problem in the sense that cells solve it
> deterministically.  Thus the problem is due to lack of understanding and
> not due to complexity.  AlphaFold and all the others 

Re: [ccp4bb] AW: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Artem Evdokimov
Well that is sad, and true, and also very common. I have personally
experienced dozens of cases where methods from literature do not reproduce
because (and this is important) the authors "just slap some generic
boilerplate" instead of the actual methods. My favorite is always to read
stuff like "such and such protein was cloned into bacterial expression
vector, expressed and and purified using standard methods" and then later
find out through considerable effort and twisting hands of original
researchers that the protein can only be expressed when fused with a Spider
Monkey cadherin domain and expressed in minimal medium supplemented with 5%
Pregnant Horse Urine at exactly 13.5 degrees C. And then purified using the
Spider Monkey cadherin monoclonal antibody. And the yield is 1 mg in 24
liters. None of which was ever disclosed in literature...

Sorry for the rant, I guess I am just saying that literature, IMO, has long
ago stopped being generally directly reproducible. Not getting into the
obvious reasons as to why it happened, but still sad that it happened.

Artem

On Tue, Dec 8, 2020, 8:28 AM Hughes, Jonathan <
jon.hug...@bot3.bio.uni-giessen.de> wrote:

> scientific research requires that experimental results must be testable,
> so you have to publish your methods too. if the alphafold2 people don't
> make their code accessible, they are playing a game with different rules.
> maybe it's called capitalism: i gather they're a private company
>
> best
>
> jon
>
>
>
> *Von:* CCP4 bulletin board  *Im Auftrag von *Goldman,
> Adrian
> *Gesendet:* Dienstag, 8. Dezember 2020 12:33
> *An:* CCP4BB@JISCMAIL.AC.UK
> *Betreff:* Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking
> and less pipetting (?)
>
>
>
> My impression is that they haven’t published the code, and it is science
> by press-release.  If one of us tried it, we would - rightly - get hounded
> out of time.
>
>
>
> Adrian
>
>
>
>
>
>
>
> On 4 Dec 2020, at 15:57, Michel Fodje  wrote:
>
>
>
> I think the results from AlphaFold2, although exciting and a breakthrough
> are being exaggerated just a bit.  We know that all the information
> required for the 3D structure is in the sequence. The protein folding
> problem is simply how to go from a sequence to the 3D structure. This is
> not a complex problem in the sense that cells solve it deterministically.
> Thus the problem is due to lack of understanding and not due to
> complexity.  AlphaFold and all the others trying to solve this problem are
> “cheating” in that they are not just using the sequence, they are using
> other sequences like it (multiple-sequence alignments), and they are using
> all the structural information contained in the PDB.  All of this
> information is not used by the cells.   In short, unless AlphaFold2 now
> allows us to understand how exactly a single protein sequence produces a
> particular 3D structure, the protein folding problem is hardly solved in a
> theoretical sense. The only reason we know how well AlphaFold2 did is
> because the structures were solved and we could compare with the
> predictions, which means verification is lacking.
>
>
>
> The protein folding problem will be solved when we understand how to go
> from a sequence to a structure, and can verify a given structure to be
> correct without experimental data. Even if AlphaFold2 got 99% of structures
> right, your next interesting target protein might be the 1%. How would you
> know?   Until then, what AlphaFold2 is telling us right now is that all
> (most) of the information present in the sequence that determines the 3D
> structure can be gleaned in bits and pieces scattered between homologous
> sequences, multiple-sequence alignments, and other protein 3D structures in
> the PDB.  Deep Learning allows a huge amount of data to be thrown at a
> problem and the back-propagation of the networks then allows careful
> fine-tuning of weights which determine how relevant different pieces of
> information are to the prediction.  The networks used here are humongous
> and a detailed look at the weights (if at all feasible) may point us in the
> right direction.
>
>
>
>
>
> *From:* CCP4 bulletin board  *On Behalf Of *Nave,
> Colin (DLSLtd,RAL,LSCI)
> *Sent:* December 4, 2020 9:14 AM
> *To:* CCP4BB@JISCMAIL.AC.UK
> *Subject:* External: Re: [ccp4bb] AlphaFold: more thinking and less
> pipetting (?)
>
>
>
> The subject line for Isabel’s email is very good.
>
>
>
> I do have a question (more a request) for the more computer scientist
> oriented people. I think it is relevant for where this technology will be
> going. It comes from trying to understand whether problems addressed by
> Alpha are NP, NP hard, NP complete etc. My understanding is that the
> previous successes of Alpha were for complete information games such as
> Chess and Go. Both the rules and the present position were available to
> both sides. The folding problem might be in a different category. It would
> be nice if someone could explain 

[ccp4bb] AW: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Hughes, Jonathan
scientific research requires that experimental results must be testable, so you 
have to publish your methods too. if the alphafold2 people don't make their 
code accessible, they are playing a game with different rules. maybe it's 
called capitalism: i gather they're a private company
best
jon

Von: CCP4 bulletin board  Im Auftrag von Goldman, Adrian
Gesendet: Dienstag, 8. Dezember 2020 12:33
An: CCP4BB@JISCMAIL.AC.UK
Betreff: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less 
pipetting (?)

My impression is that they haven’t published the code, and it is science by 
press-release.  If one of us tried it, we would - rightly - get hounded out of 
time.

Adrian




On 4 Dec 2020, at 15:57, Michel Fodje 
mailto:michel.fo...@lightsource.ca>> wrote:

I think the results from AlphaFold2, although exciting and a breakthrough are 
being exaggerated just a bit.  We know that all the information required for 
the 3D structure is in the sequence. The protein folding problem is simply how 
to go from a sequence to the 3D structure. This is not a complex problem in the 
sense that cells solve it deterministically.  Thus the problem is due to lack 
of understanding and not due to complexity.  AlphaFold and all the others 
trying to solve this problem are “cheating” in that they are not just using the 
sequence, they are using other sequences like it (multiple-sequence 
alignments), and they are using all the structural information contained in the 
PDB.  All of this information is not used by the cells.   In short, unless 
AlphaFold2 now allows us to understand how exactly a single protein sequence 
produces a particular 3D structure, the protein folding problem is hardly 
solved in a theoretical sense. The only reason we know how well AlphaFold2 did 
is because the structures were solved and we could compare with the 
predictions, which means verification is lacking.

The protein folding problem will be solved when we understand how to go from a 
sequence to a structure, and can verify a given structure to be correct without 
experimental data. Even if AlphaFold2 got 99% of structures right, your next 
interesting target protein might be the 1%. How would you know?   Until then, 
what AlphaFold2 is telling us right now is that all (most) of the information 
present in the sequence that determines the 3D structure can be gleaned in bits 
and pieces scattered between homologous sequences, multiple-sequence 
alignments, and other protein 3D structures in the PDB.  Deep Learning allows a 
huge amount of data to be thrown at a problem and the back-propagation of the 
networks then allows careful fine-tuning of weights which determine how 
relevant different pieces of information are to the prediction.  The networks 
used here are humongous and a detailed look at the weights (if at all feasible) 
may point us in the right direction.


From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Nave, Colin (DLSLtd,RAL,LSCI)
Sent: December 4, 2020 9:14 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

The subject line for Isabel’s email is very good.

I do have a question (more a request) for the more computer scientist oriented 
people. I think it is relevant for where this technology will be going. It 
comes from trying to understand whether problems addressed by Alpha are NP, NP 
hard, NP complete etc. My understanding is that the previous successes of Alpha 
were for complete information games such as Chess and Go. Both the rules and 
the present position were available to both sides. The folding problem might be 
in a different category. It would be nice if someone could explain the 
difference (if any) between Go and the protein folding problem perhaps using 
the NP type categories.

Colin



From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Isabel Garcia-Saez
Sent: 03 December 2020 11:18
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

Dear all,

Just commenting that after the stunning performance of AlphaFold that uses AI 
from Google maybe some of us we could dedicate ourselves to the noble art of 
gardening, baking, doing Chinese Calligraphy, enjoying the clouds pass or 
everything together (just in case I have already prepared my subscription to 
Netflix).

https://www.nature.com/articles/d41586-020-03348-4

Well, I suppose that we still have the structures of complexes (at the moment). 
I am wondering how the labs will have access to this technology in the future 
(would it be for free coming from the company DeepMind - Google?). It seems 
that they have already published some code. Well, exciting times.

Cheers,

Isabel


Isabel Garcia-Saez  PhD
Institut de Biologie Structurale
Viral Infection and Cancer Group (VIC)-Cell Division Team
71, Avenue des Martyrs
CS 10090
38044 Grenoble 

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Tristan Croll
This is a number that needs to be interpreted with some care. 2 Å crystal 
structures in general achieve an RMSD of 0.2 Å on the portion of the crystal 
that's resolved, including loops that are often only in well-resolved 
conformations due to physiologically-irrelevant crystal packing interactions. 
The predicted models, on the other hand, are in isolation. Once you get to the 
level achieved by this last round of predictions, that starts making fair 
comparison somewhat more difficult*. Two obvious options that I see: (1) limit 
the comparison only to the stable core of the protein (in which case many of 
the predictions have RMSDs in the very low fractions of an Angstrom), or (2) 
compare ensembles derived from MD simulations starting from the experimental 
and predicted structure, and see how well they overlap.

-- Tristan

* There's one more thorny issue when you get to this level: it becomes more and 
more possible (even likely) that the prediction gets some things right that are 
wrong in the experimental structure.

From: CCP4 bulletin board  on behalf of Ian Tickle 

Sent: 08 December 2020 13:04
To: CCP4BB@JISCMAIL.AC.UK 
Subject: Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less 
pipetting (?)


There was a little bit of press-release hype: the release stated "a score of 
around 90 GDT is informally considered to be competitive with results obtained 
from experimental methods" and "our latest AlphaFold system achieves a median 
score of 92.4 GDT overall across all targets. This means that our predictions 
have an average error 
(RMSD)
 of approximately 1.6 Angstroms,".

Experimental methods achieve an average error of around 0.2 Ang. or better at 2 
Ang. resolution, and of course much better at atomic resolution (1 Ang. or 
better), or around 0.5 Ang. at 3 Ang. resolution.  For ligand-binding studies I 
would say you need 3 Ang. resolution or better.  1.6 Ang. error is probably 
equivalent to around 4 Ang. resolution.  No doubt that will improve with time 
and experience, though I think it will be an uphill struggle to get better than 
1 Ang. error, simply because the method can't be better than the data that go 
into it and 1-1.5 Ang. represents a typical spread of homologous models in the 
PDB.  So yes very competitive if you're desperate for a MR starting model, but 
not quite yet there for a refined high-resolution structure.

Cheers

-- Ian


On Tue, 8 Dec 2020 at 12:11, Harry Powell - CCP4BB 
<193323b1e616-dmarc-requ...@jiscmail.ac.uk>
 wrote:
Hi

It’s a bit more than science by press release - they took part in CASP14 where 
they were given sequences but no other experimental data, and did significantly 
better than the other homology modellers (who had access to the same data) when 
judge by independent analysis. There were things wrong with their structures, 
sure, but in almost every case they were less wrong than the other modellers 
(many of whom have been working on this problem for decades).

It _will_ be more impressive once the methods they used (or equivalents) are 
implemented by other groups and are available to the “public” (I haven’t found 
an AlphaFold webserver to submit a sequence to, whereas the other groups in the 
field do make their methods readily available), but it’s still a step-change in 
protein structure prediction - it shows it can be done pretty well.

Michel is right, of course; you can’t have homology modelling without 
homologous models, which are drawn from the PDB - but the other modellers had 
the same access to the PDB (just as we all do…).

Just my two ha’porth.

Harry

> On 8 Dec 2020, at 11:33, Goldman, Adrian 
> mailto:adrian.gold...@helsinki.fi>> wrote:
>
> My impression is that they haven’t published the code, and it is science by 
> press-release.  If one of us tried it, we would - rightly - get hounded out 
> of time.
>
> Adrian
>
>
>
>> On 4 Dec 2020, at 15:57, Michel Fodje 
>> mailto:michel.fo...@lightsource.ca>> wrote:
>>
>> I think the results from AlphaFold2, although exciting and a breakthrough 
>> are being exaggerated just a bit.  We know that all the information required 
>> for the 3D structure is in the sequence. The protein folding problem is 
>> simply how to go from a sequence to the 3D structure. This is not a complex 
>> problem in the sense that cells solve it deterministically.  Thus the 
>> problem is due to lack of understanding and not due to complexity.  
>> AlphaFold and all the others trying to solve this problem are “cheating” in 
>> that they are not just using the sequence, they are using other sequences 
>> like it (multiple-sequence alignments), and they are using all the 
>> structural information contained in the PDB.  All of this information is not 
>> used by the cells.   In short, unless 

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Harry Powell - CCP4BB
Since I didn’t actually read the press release (but “attended" the CASP event 
instead where the results were discussed in a little more detail…) this is news 
to me, but I’d agree that there is some hyperbole there. 

Alphafold2 (it’s probably important to distinguish it from AlphaFold since it’s 
a complete re-write, from what I understand) was the only software that 
reliably gave results that could be used for MR. I don’t know how long it 
actually took to run the modelling for each sequence (building its libraries 
was, shall we say, “time-consuming”), but it may not be a bad way to get an MR 
model if you can get DeepMind to run the modelling for you.

Harry

> On 8 Dec 2020, at 13:04, Ian Tickle  wrote:
> 
> 
> There was a little bit of press-release hype: the release stated "a score of 
> around 90 GDT is informally considered to be competitive with results 
> obtained from experimental methods" and "our latest AlphaFold system achieves 
> a median score of 92.4 GDT overall across all targets. This means that our 
> predictions have an average error (RMSD) of approximately 1.6 Angstroms,".
> 
> Experimental methods achieve an average error of around 0.2 Ang. or better at 
> 2 Ang. resolution, and of course much better at atomic resolution (1 Ang. or 
> better), or around 0.5 Ang. at 3 Ang. resolution.  For ligand-binding studies 
> I would say you need 3 Ang. resolution or better.  1.6 Ang. error is probably 
> equivalent to around 4 Ang. resolution.  No doubt that will improve with time 
> and experience, though I think it will be an uphill struggle to get better 
> than 1 Ang. error, simply because the method can't be better than the data 
> that go into it and 1-1.5 Ang. represents a typical spread of homologous 
> models in the PDB.  So yes very competitive if you're desperate for a MR 
> starting model, but not quite yet there for a refined high-resolution 
> structure.
> 
> Cheers
> 
> -- Ian
> 
> 
> On Tue, 8 Dec 2020 at 12:11, Harry Powell - CCP4BB 
> <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:
> Hi
> 
> It’s a bit more than science by press release - they took part in CASP14 
> where they were given sequences but no other experimental data, and did 
> significantly better than the other homology modellers (who had access to the 
> same data) when judge by independent analysis. There were things wrong with 
> their structures, sure, but in almost every case they were less wrong than 
> the other modellers (many of whom have been working on this problem for 
> decades).
> 
> It _will_ be more impressive once the methods they used (or equivalents) are 
> implemented by other groups and are available to the “public” (I haven’t 
> found an AlphaFold webserver to submit a sequence to, whereas the other 
> groups in the field do make their methods readily available), but it’s still 
> a step-change in protein structure prediction - it shows it can be done 
> pretty well.
> 
> Michel is right, of course; you can’t have homology modelling without 
> homologous models, which are drawn from the PDB - but the other modellers had 
> the same access to the PDB (just as we all do…).
> 
> Just my two ha’porth.
> 
> Harry
> 
> > On 8 Dec 2020, at 11:33, Goldman, Adrian  wrote:
> > 
> > My impression is that they haven’t published the code, and it is science by 
> > press-release.  If one of us tried it, we would - rightly - get hounded out 
> > of time.
> > 
> > Adrian
> > 
> > 
> > 
> >> On 4 Dec 2020, at 15:57, Michel Fodje  wrote:
> >> 
> >> I think the results from AlphaFold2, although exciting and a breakthrough 
> >> are being exaggerated just a bit.  We know that all the information 
> >> required for the 3D structure is in the sequence. The protein folding 
> >> problem is simply how to go from a sequence to the 3D structure. This is 
> >> not a complex problem in the sense that cells solve it deterministically.  
> >> Thus the problem is due to lack of understanding and not due to 
> >> complexity.  AlphaFold and all the others trying to solve this problem are 
> >> “cheating” in that they are not just using the sequence, they are using 
> >> other sequences like it (multiple-sequence alignments), and they are using 
> >> all the structural information contained in the PDB.  All of this 
> >> information is not used by the cells.   In short, unless AlphaFold2 now 
> >> allows us to understand how exactly a single protein sequence produces a 
> >> particular 3D structure, the protein folding problem is hardly solved in a 
> >> theoretical sense. The only reason we know how well AlphaFold2 did is 
> >> because the structures were solved and we could compare with the 
> >> predictions, which means verification is lacking.
> >>  
> >> The protein folding problem will be solved when we understand how to go 
> >> from a sequence to a structure, and can verify a given structure to be 
> >> correct without experimental data. Even if AlphaFold2 got 99% of 
> >> structures right, your next 

Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread John R Helliwell
Dear Isabel,
My article in the IUCr Newsletter on DeepMind and CASP14 is released today and 
can be found here:-
https://www.iucr.org/news/newsletter/volume-28/number-4/deepmind-and-casp14
Best wishes,
John 
Emeritus Professor John R Helliwell DSc




> On 3 Dec 2020, at 11:17, Isabel Garcia-Saez  wrote:
> 
> 
> Dear all,
> 
> Just commenting that after the stunning performance of AlphaFold that uses AI 
> from Google maybe some of us we could dedicate ourselves to the noble art of 
> gardening, baking, doing Chinese Calligraphy, enjoying the clouds pass or 
> everything together (just in case I have already prepared my subscription to 
> Netflix).
> 
> https://www.nature.com/articles/d41586-020-03348-4
> 
> Well, I suppose that we still have the structures of complexes (at the 
> moment). I am wondering how the labs will have access to this technology in 
> the future (would it be for free coming from the company DeepMind - Google?). 
> It seems that they have already published some code. Well, exciting times. 
> 
> Cheers,
> 
> Isabel
> 
> 
> Isabel Garcia-SaezPhD
> Institut de Biologie Structurale
> Viral Infection and Cancer Group (VIC)-Cell Division Team
> 71, Avenue des Martyrs
> CS 10090
> 38044 Grenoble Cedex 9
> France
> Tel.: 00 33 (0) 457 42 86 15
> e-mail: isabel.gar...@ibs.fr
> FAX: 00 33 (0) 476 50 18 90
> http://www.ibs.fr/
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Ian Tickle
There was a little bit of press-release hype: the release stated "a score
of around 90 GDT is informally considered to be competitive with results
obtained from experimental methods" and "our latest AlphaFold system
achieves a median score of 92.4 GDT overall across all targets. This means
that our predictions have an average error (RMSD
)
of approximately 1.6 Angstroms ,".

Experimental methods achieve an average error of around 0.2 Ang. or better
at 2 Ang. resolution, and of course much better at atomic resolution (1
Ang. or better), or around 0.5 Ang. at 3 Ang. resolution.  For
ligand-binding studies I would say you need 3 Ang. resolution or better.
1.6 Ang. error is probably equivalent to around 4 Ang. resolution.  No
doubt that will improve with time and experience, though I think it will be
an uphill struggle to get better than 1 Ang. error, simply because the
method can't be better than the data that go into it and 1-1.5 Ang.
represents a typical spread of homologous models in the PDB.  So yes very
competitive if you're desperate for a MR starting model, but not quite yet
there for a refined high-resolution structure.

Cheers

-- Ian


On Tue, 8 Dec 2020 at 12:11, Harry Powell - CCP4BB <
193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:

> Hi
>
> It’s a bit more than science by press release - they took part in CASP14
> where they were given sequences but no other experimental data, and did
> significantly better than the other homology modellers (who had access to
> the same data) when judge by independent analysis. There were things wrong
> with their structures, sure, but in almost every case they were less wrong
> than the other modellers (many of whom have been working on this problem
> for decades).
>
> It _will_ be more impressive once the methods they used (or equivalents)
> are implemented by other groups and are available to the “public” (I
> haven’t found an AlphaFold webserver to submit a sequence to, whereas the
> other groups in the field do make their methods readily available), but
> it’s still a step-change in protein structure prediction - it shows it can
> be done pretty well.
>
> Michel is right, of course; you can’t have homology modelling without
> homologous models, which are drawn from the PDB - but the other modellers
> had the same access to the PDB (just as we all do…).
>
> Just my two ha’porth.
>
> Harry
>
> > On 8 Dec 2020, at 11:33, Goldman, Adrian 
> wrote:
> >
> > My impression is that they haven’t published the code, and it is science
> by press-release.  If one of us tried it, we would - rightly - get hounded
> out of time.
> >
> > Adrian
> >
> >
> >
> >> On 4 Dec 2020, at 15:57, Michel Fodje 
> wrote:
> >>
> >> I think the results from AlphaFold2, although exciting and a
> breakthrough are being exaggerated just a bit.  We know that all the
> information required for the 3D structure is in the sequence. The protein
> folding problem is simply how to go from a sequence to the 3D structure.
> This is not a complex problem in the sense that cells solve it
> deterministically.  Thus the problem is due to lack of understanding and
> not due to complexity.  AlphaFold and all the others trying to solve this
> problem are “cheating” in that they are not just using the sequence, they
> are using other sequences like it (multiple-sequence alignments), and they
> are using all the structural information contained in the PDB.  All of this
> information is not used by the cells.   In short, unless AlphaFold2 now
> allows us to understand how exactly a single protein sequence produces a
> particular 3D structure, the protein folding problem is hardly solved in a
> theoretical sense. The only reason we know how well AlphaFold2 did is
> because the structures were solved and we could compare with the
> predictions, which means verification is lacking.
> >>
> >> The protein folding problem will be solved when we understand how to go
> from a sequence to a structure, and can verify a given structure to be
> correct without experimental data. Even if AlphaFold2 got 99% of structures
> right, your next interesting target protein might be the 1%. How would you
> know?   Until then, what AlphaFold2 is telling us right now is that all
> (most) of the information present in the sequence that determines the 3D
> structure can be gleaned in bits and pieces scattered between homologous
> sequences, multiple-sequence alignments, and other protein 3D structures in
> the PDB.  Deep Learning allows a huge amount of data to be thrown at a
> problem and the back-propagation of the networks then allows careful
> fine-tuning of weights which determine how relevant different pieces of
> information are to the prediction.  The networks used here are humongous
> and a detailed look at the weights (if at all feasible) may point us in the
> right direction.
> >>
> >>
> >> From: CCP4 

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Harry Powell - CCP4BB
Hi

It’s a bit more than science by press release - they took part in CASP14 where 
they were given sequences but no other experimental data, and did significantly 
better than the other homology modellers (who had access to the same data) when 
judge by independent analysis. There were things wrong with their structures, 
sure, but in almost every case they were less wrong than the other modellers 
(many of whom have been working on this problem for decades).

It _will_ be more impressive once the methods they used (or equivalents) are 
implemented by other groups and are available to the “public” (I haven’t found 
an AlphaFold webserver to submit a sequence to, whereas the other groups in the 
field do make their methods readily available), but it’s still a step-change in 
protein structure prediction - it shows it can be done pretty well.

Michel is right, of course; you can’t have homology modelling without 
homologous models, which are drawn from the PDB - but the other modellers had 
the same access to the PDB (just as we all do…).

Just my two ha’porth.

Harry

> On 8 Dec 2020, at 11:33, Goldman, Adrian  wrote:
> 
> My impression is that they haven’t published the code, and it is science by 
> press-release.  If one of us tried it, we would - rightly - get hounded out 
> of time.
> 
> Adrian
> 
> 
> 
>> On 4 Dec 2020, at 15:57, Michel Fodje  wrote:
>> 
>> I think the results from AlphaFold2, although exciting and a breakthrough 
>> are being exaggerated just a bit.  We know that all the information required 
>> for the 3D structure is in the sequence. The protein folding problem is 
>> simply how to go from a sequence to the 3D structure. This is not a complex 
>> problem in the sense that cells solve it deterministically.  Thus the 
>> problem is due to lack of understanding and not due to complexity.  
>> AlphaFold and all the others trying to solve this problem are “cheating” in 
>> that they are not just using the sequence, they are using other sequences 
>> like it (multiple-sequence alignments), and they are using all the 
>> structural information contained in the PDB.  All of this information is not 
>> used by the cells.   In short, unless AlphaFold2 now allows us to understand 
>> how exactly a single protein sequence produces a particular 3D structure, 
>> the protein folding problem is hardly solved in a theoretical sense. The 
>> only reason we know how well AlphaFold2 did is because the structures were 
>> solved and we could compare with the predictions, which means verification 
>> is lacking.
>>  
>> The protein folding problem will be solved when we understand how to go from 
>> a sequence to a structure, and can verify a given structure to be correct 
>> without experimental data. Even if AlphaFold2 got 99% of structures right, 
>> your next interesting target protein might be the 1%. How would you know?   
>> Until then, what AlphaFold2 is telling us right now is that all (most) of 
>> the information present in the sequence that determines the 3D structure can 
>> be gleaned in bits and pieces scattered between homologous sequences, 
>> multiple-sequence alignments, and other protein 3D structures in the PDB.  
>> Deep Learning allows a huge amount of data to be thrown at a problem and the 
>> back-propagation of the networks then allows careful fine-tuning of weights 
>> which determine how relevant different pieces of information are to the 
>> prediction.  The networks used here are humongous and a detailed look at the 
>> weights (if at all feasible) may point us in the right direction.
>>  
>>  
>> From: CCP4 bulletin board  On Behalf Of Nave, Colin 
>> (DLSLtd,RAL,LSCI)
>> Sent: December 4, 2020 9:14 AM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting 
>> (?)
>>  
>> The subject line for Isabel’s email is very good.
>>  
>> I do have a question (more a request) for the more computer scientist 
>> oriented people. I think it is relevant for where this technology will be 
>> going. It comes from trying to understand whether problems addressed by 
>> Alpha are NP, NP hard, NP complete etc. My understanding is that the 
>> previous successes of Alpha were for complete information games such as 
>> Chess and Go. Both the rules and the present position were available to both 
>> sides. The folding problem might be in a different category. It would be 
>> nice if someone could explain the difference (if any) between Go and the 
>> protein folding problem perhaps using the NP type categories.
>>  
>> Colin
>>  
>>  
>>  
>> From: CCP4 bulletin board  On Behalf Of Isabel 
>> Garcia-Saez
>> Sent: 03 December 2020 11:18
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: [ccp4bb] AlphaFold: more thinking and less pipetting (?)
>>  
>> Dear all,
>>  
>> Just commenting that after the stunning performance of AlphaFold that uses 
>> AI from Google maybe some of us we could dedicate ourselves to the noble art 
>> of gardening, baking, 

Re: [ccp4bb] External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

2020-12-08 Thread Goldman, Adrian
My impression is that they haven’t published the code, and it is science by 
press-release.  If one of us tried it, we would - rightly - get hounded out of 
time.

Adrian



On 4 Dec 2020, at 15:57, Michel Fodje 
mailto:michel.fo...@lightsource.ca>> wrote:

I think the results from AlphaFold2, although exciting and a breakthrough are 
being exaggerated just a bit.  We know that all the information required for 
the 3D structure is in the sequence. The protein folding problem is simply how 
to go from a sequence to the 3D structure. This is not a complex problem in the 
sense that cells solve it deterministically.  Thus the problem is due to lack 
of understanding and not due to complexity.  AlphaFold and all the others 
trying to solve this problem are “cheating” in that they are not just using the 
sequence, they are using other sequences like it (multiple-sequence 
alignments), and they are using all the structural information contained in the 
PDB.  All of this information is not used by the cells.   In short, unless 
AlphaFold2 now allows us to understand how exactly a single protein sequence 
produces a particular 3D structure, the protein folding problem is hardly 
solved in a theoretical sense. The only reason we know how well AlphaFold2 did 
is because the structures were solved and we could compare with the 
predictions, which means verification is lacking.

The protein folding problem will be solved when we understand how to go from a 
sequence to a structure, and can verify a given structure to be correct without 
experimental data. Even if AlphaFold2 got 99% of structures right, your next 
interesting target protein might be the 1%. How would you know?   Until then, 
what AlphaFold2 is telling us right now is that all (most) of the information 
present in the sequence that determines the 3D structure can be gleaned in bits 
and pieces scattered between homologous sequences, multiple-sequence 
alignments, and other protein 3D structures in the PDB.  Deep Learning allows a 
huge amount of data to be thrown at a problem and the back-propagation of the 
networks then allows careful fine-tuning of weights which determine how 
relevant different pieces of information are to the prediction.  The networks 
used here are humongous and a detailed look at the weights (if at all feasible) 
may point us in the right direction.


From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Nave, Colin (DLSLtd,RAL,LSCI)
Sent: December 4, 2020 9:14 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: External: Re: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

The subject line for Isabel’s email is very good.

I do have a question (more a request) for the more computer scientist oriented 
people. I think it is relevant for where this technology will be going. It 
comes from trying to understand whether problems addressed by Alpha are NP, NP 
hard, NP complete etc. My understanding is that the previous successes of Alpha 
were for complete information games such as Chess and Go. Both the rules and 
the present position were available to both sides. The folding problem might be 
in a different category. It would be nice if someone could explain the 
difference (if any) between Go and the protein folding problem perhaps using 
the NP type categories.

Colin



From: CCP4 bulletin board mailto:CCP4BB@JISCMAIL.AC.UK>> 
On Behalf Of Isabel Garcia-Saez
Sent: 03 December 2020 11:18
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] AlphaFold: more thinking and less pipetting (?)

Dear all,

Just commenting that after the stunning performance of AlphaFold that uses AI 
from Google maybe some of us we could dedicate ourselves to the noble art of 
gardening, baking, doing Chinese Calligraphy, enjoying the clouds pass or 
everything together (just in case I have already prepared my subscription to 
Netflix).

https://www.nature.com/articles/d41586-020-03348-4

Well, I suppose that we still have the structures of complexes (at the moment). 
I am wondering how the labs will have access to this technology in the future 
(would it be for free coming from the company DeepMind - Google?). It seems 
that they have already published some code. Well, exciting times.

Cheers,

Isabel


Isabel Garcia-Saez  PhD
Institut de Biologie Structurale
Viral Infection and Cancer Group (VIC)-Cell Division Team
71, Avenue des Martyrs
CS 10090
38044 Grenoble Cedex 9
France
Tel.: 00 33 (0) 457 42 86 15
e-mail: isabel.gar...@ibs.fr
FAX: 00 33 (0) 476 50 18 90
http://www.ibs.fr/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



--

This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised 

[ccp4bb] Postdoc position at Imperial College London

2020-12-08 Thread Zhang, Xiaodong
Dear all,

There is a postdoc position available in my group 
(https://www.imperial.ac.uk/people/xiaodong.zhang). The successful candidate 
will join a multi-disciplinary team of international researchers investigating 
the structures and mechanisms of key components involved in eukaryotic DNA 
damage signalling and repair. We employ a combination of cryo electron 
microscopy, X-ray crystallography, biochemical and biophysical techniques, 
utilising our excellent structural biology facilities at Imperial College 
London.

Details about this post and how to apply can be found 
http://www.imperial.ac.uk/jobs/description/MED02162/research-associate-structural-biology

For informal enquiries, please email me directly with your CV.

with best wishes

Xiaodong

Professor Xiaodong Zhang
Section of Structural Biology, Department of Infectious Diseases, Imperial 
College London
104, Sir Alexander Fleming Building, South Kensington, London, SW7 2AZ, +44 207 
594 3151, PA: Ms. Kasia Pearce 
(k.pea...@imperial.ac.uk). 0207 594 2763
www.imperial.ac.uk/people/xiaodong.zhang






To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/