Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-16 Thread Jasmine Young
Dear Marcin, Thank you for your feedback. We will improve mmCIF documentation for this recommendation. Most of the wwPDB remediation do not require change of chain id or residue numbering. For new data representation such as carbohydrates and previous remediation of peptide-like inhibitors,

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-11 Thread Marcin Wojdyr
Dear Jasmine, I fully agree with this recommendation: > To use the wwPDB-assigned chain ID in publications, > _atom_site.auth_seq_id _atom_site.auth_comp_id, and > _atom_site.auth_asym_id can be used for the residue number, residue ID, > and chain ID, respectively. It would help a lot if the

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-10 Thread Engin Özkan
Dear Jasmine, Thank you for contributing to this thread. This has been asked in a different way, but can we simply assume at this point that the mmCIF/PDB records will no longer contain any or separate chain ID-like item that reflects chains including proteins and their glycans, as has been

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-10 Thread Jasmine Young
Dear Marcin, The cif item, _pdbx_branch_scheme.pdb_asym_id, in the pdbx_branch_scheme category is a pointer to _atom_site.auth_asym_id in the atom_site category (I know this is confusing). The labels are consistently defined as the ones in _pdbx_poly_seq_scheme and _pdbx_nonpoly_scheme. To

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-09 Thread Marcin Wojdyr
Dear Jasmine, thank you for this explanation. It's the best explanation of this remediation I've read. The use of IDs may confuse people, so I'd like to reiterate it and ask for clarification. Every residue in the mmCIF format has three (3) independent chain IDs assigned to it (and three

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-09 Thread Jasmine Young
Dear Robbie, In the case that only single monosaccharide was modelled at glycosylation site with a known oligosaccharide sequence, technically the software cannot generate glycosidic linkages, linear descriptors for sequences, 2D SNFG images, etc. Therefore, this single monosaccharide cannot

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-09 Thread Robbie Joosten
Dear Jasmine, I have a few questions about this bit: // As some users pointed out, single NAG could be just a part of the glycan that the author chose to build, as most natural N-glycans must have stem of a common core of 5 monosaccharides or its fucosylated version, such as those modeled in

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-08 Thread Jasmine Young
Dear PDB Data Users: Thank you for providing feedback on the results of an archival-level carbohydrate remediation project that led to the re-release of over 14,000 PDB structures in July 2020. This update includes diverse oligosaccharides: glycosylation; metabolites such as maltose, sucrose,

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-07 Thread Dale Tronrud
We have drifted far from the original topic of this thread and if we continue I'll just make more of a fool of myself. I'll just go back to the original topic that I started with, that encoding connectivity information into an ID is not reliable or sustainable in a relational database.

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-05 Thread Marcin Wojdyr
On Fri, 4 Dec 2020 at 22:36, Dale Tronrud wrote: > > It is very important not to read more meaning into a data tag than > is actually defined in the mmCIF spec. _atom_site.label_seq_id is defined > > http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.label_seq_id.html >

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud
On 12/4/2020 12:15 PM, Marcin Wojdyr wrote: > On Fri, 4 Dec 2020 at 19:16, Dale Tronrud wrote: >> learn the sequence you have to go to the mmCIF records that define the >> connectivity between residues. It is entirely possible that "3" comes >> before "1" because these indexes don't contain any

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Marcin Wojdyr
On Fri, 4 Dec 2020 at 19:16, Dale Tronrud wrote: > > Creating meaning in the chain names "A, B, C, Ag1, Ag2, Ag3" is > exactly the problem. It's not about "creating meaning" but about consistent naming. For humans. > "chain names" ( or "entity identifiers" if I > recall the mmCIF

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud
I agree that the user experience is very important, but that is not the purpose of a data base design. The data scheme is designed for the storage and manipulation of data by software in a clear and unambiguous way. The presentation of the data to a user is the job of the application

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Robbie Joosten
Ah yes, polymer connectivity depends on the order of appearance not the numbering. On top of that, the connectivity is implicit. There are structures where some chains are numbered in reverse order, especially in double helices. How convenient is it that each base pair has residues of the same

Re: [ccp4bb] pdb-l: Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Tristan Croll
No - they're changing the auth_asym_id. See https://www.wwpdb.org/documentation/carbohydrate-remediation: Oligosaccharide molecules are classified as a new entity type, branched, assigned a unique chain ID (_atom_site.auth_asym_id) and a new mmCIF category introduced to define the type of

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Robbie Joosten
Hi Luca, Your point remains completely valid and I agree that residues that can belong to a longer chain should be treated as such. The same problem is with peptide ligands (at least in PDB times), if they consist of three residues they would their own chains, with 2 residues they would not.

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Tristan Croll
OK, I understand your point more clearly now - but I'm not sure I fully agree, for the simple reason that people aren't computers. You're right that for the purposes of software validation tools the chain IDs are essentially arbitrary - as long as they're unique, nothing else really matters.

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud
Creating meaning in the chain names "A, B, C, Ag1, Ag2, Ag3" is exactly the problem. "chain names" ( or "entity identifiers" if I recall the mmCIF terminology correctly) are simply database "indexes". The values of indices are meaningless in themselves, they are just unique values that can

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread radu
Hi Tristan, I fully subscribe to your idea! I was quite surprised to see our model revised with different glycan chain IDs upon PDB annotation. I imagine there must have been some "administrative" reasoning behind this decision, but it's just a nightmare for subsequent visualisation. And, to me

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Tristan Croll
This suggestion violates a basic principle of data base theory. A single data item cannot encode two pieces of information. I'm sorry if I was unclear, but I don't believe I was suggesting anything of the sort. Hopefully this example should make it more clear - I'm just suggesting a slight

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Luca Jovine
Dear Dale and Robbie, I agree with your comments! But may I stir back the discussion to the original issue, which is that one-residue N-glycans are now treated differently from multi-residue N-glycans (although they are both covalently linked to a protein chain)? This inconsistency is

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Robbie Joosten
Dear Dale,Yes, good point. Let's stop bending over backwards to come up with faux PDB compatibility and focus on making mmCIF better.There are struct_conn records that describe the linkages. This is enough to reconstruct the connectivity. There is an ongoing debate on how to capture the restraints

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Dale Tronrud
This suggestion violates a basic principle of data base theory. A single data item cannot encode two pieces of information. The whole structure of CIF falls apart if this is done. Does the new PDB convention contain a CIF record of the link that bridges between the protein chain and

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Marcin Wojdyr
On Fri, 4 Dec 2020 at 09:21, Luca Jovine wrote: > > Yes Tristan, that would be even better - also because such an Ag1, Ag2,… > system could conveniently fall back on a single-character chain A, when > generating legacy PDB format files from the mmCIF ones. mmCIF already has two sets of

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Boaz Shaanan
Hi all, Can someone point me to cases of glycoprotein structures in the PDB for which the old (traditional?) system of naming N or O linked chain was found inadequate? Thanks. Stay safe, Boaz Boaz Shaanan, Ph.D. Department of Life Sciences Ben Gurion University of the Negev Beer

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Zhijie Li
Hi Tristan and all, I totally agree that randomly naming the glycan chains is going to give users headaches. But using more than 2 letters would make the entry incompatible with the PDB format, which I wish will remain as a download option for as long as possible. How about restricting the

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Luca Jovine
Yes Tristan, that would be even better - also because such an Ag1, Ag2,… system could conveniently fall back on a single-character chain A, when generating legacy PDB format files from the mmCIF ones. Exactly for the reason that you pointed out, personally I do not understand the logic of

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-04 Thread Tristan Croll
To go one step further: in large, heavily glycosylated multi-chain complexes the assignment of a random new chain ID to each glycan will lead to headaches for people building visualisations using existing viewers, because it loses the easy name-based association of glycan to parent protein

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-03 Thread Jan Dohnalek
We also recently encountered this way of coordinates treatment - did not find it useful but just left it as it was (life has other more exciting adventures waiting ...). I strongly support the solution suggested by Luca, A nice day to all, Jan Dohnalek On Fri, Dec 4, 2020 at 8:48 AM Luca

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-03 Thread Luca Jovine
CC: pdb-l Dear Zhijie and Robbie, I agree with both of you that the new carbohydrate chain assignment convention that has been recently adopted by PDB introduces confusion, not just for PDB-REDO but also - and especially - for end users. Could we kindly ask PDB to improve consistency by

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-03 Thread Robbie Joosten
Dear Zhijie, In generally I like the treatment of carbohydrates now as branched polymers. I didn't realise there was an exception. It makes sense for unlinked carbohydrate ligands, but not for N- or O-glycosylation sites as these might change during model building or, in my case, carbohydrate

Re: [ccp4bb] Coming July 29: Improved Carbohydrate Data at the PDB -- N-glycans are now separate chains if more than one residue

2020-12-03 Thread Zhijie Li
Hi all, I was confused when I saw mysterious new glycan chains emerging during PDB deposition and spent quite some time trying to find out what was wrong with my coordinates. Then it occurred to me that a lot of recent structures also had tens of N-glycan chains. Finally I realized that this