Dear Marcin,
Thank you for your feedback. We will improve mmCIF documentation for
this recommendation.
Most of the wwPDB remediation do not require change of chain id or
residue numbering. For new data representation such as carbohydrates and
previous remediation of peptide-like inhibitors,
Dear Jasmine,
I fully agree with this recommendation:
> To use the wwPDB-assigned chain ID in publications,
> _atom_site.auth_seq_id _atom_site.auth_comp_id, and
> _atom_site.auth_asym_id can be used for the residue number, residue ID,
> and chain ID, respectively.
It would help a lot if the
Dear Jasmine,
Thank you for contributing to this thread.
This has been asked in a different way, but can we simply assume at this
point that the mmCIF/PDB records will no longer contain any or separate
chain ID-like item that reflects chains including proteins and their
glycans, as has been
Dear Marcin,
The cif item, _pdbx_branch_scheme.pdb_asym_id, in the pdbx_branch_scheme
category is a pointer to _atom_site.auth_asym_id in the atom_site
category (I know this is confusing). The labels are consistently defined
as the ones in _pdbx_poly_seq_scheme and _pdbx_nonpoly_scheme.
To
Dear Jasmine,
thank you for this explanation. It's the best explanation of this
remediation I've read.
The use of IDs may confuse people, so I'd like to reiterate it and ask
for clarification.
Every residue in the mmCIF format has three (3) independent chain IDs
assigned to it (and three
Dear Robbie,
In the case that only single monosaccharide was modelled at
glycosylation site with a known oligosaccharide sequence, technically
the software cannot generate glycosidic linkages, linear descriptors for
sequences, 2D SNFG images, etc. Therefore, this single monosaccharide
cannot
Dear Jasmine,
I have a few questions about this bit:
//
As some users pointed out, single NAG could be just a part of the glycan that
the author chose to build, as most natural N-glycans must have stem of a common
core of 5 monosaccharides or its fucosylated version, such as those modeled in
Dear PDB Data Users:
Thank you for providing feedback on the results of an archival-level
carbohydrate remediation project that led to the re-release of over
14,000 PDB structures in July 2020. This update includes diverse
oligosaccharides: glycosylation; metabolites such as maltose, sucrose,
We have drifted far from the original topic of this thread and if we
continue I'll just make more of a fool of myself.
I'll just go back to the original topic that I started with, that
encoding connectivity information into an ID is not reliable or
sustainable in a relational database.
On Fri, 4 Dec 2020 at 22:36, Dale Tronrud wrote:
>
> It is very important not to read more meaning into a data tag than
> is actually defined in the mmCIF spec. _atom_site.label_seq_id is defined
>
> http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.label_seq_id.html
>
On 12/4/2020 12:15 PM, Marcin Wojdyr wrote:
> On Fri, 4 Dec 2020 at 19:16, Dale Tronrud wrote:
>> learn the sequence you have to go to the mmCIF records that define the
>> connectivity between residues. It is entirely possible that "3" comes
>> before "1" because these indexes don't contain any
On Fri, 4 Dec 2020 at 19:16, Dale Tronrud wrote:
>
> Creating meaning in the chain names "A, B, C, Ag1, Ag2, Ag3" is
> exactly the problem.
It's not about "creating meaning" but about consistent naming. For humans.
> "chain names" ( or "entity identifiers" if I
> recall the mmCIF
I agree that the user experience is very important, but that is not
the purpose of a data base design. The data scheme is designed for the
storage and manipulation of data by software in a clear and unambiguous
way. The presentation of the data to a user is the job of the
application
Ah yes, polymer connectivity depends on the order of appearance not the
numbering. On top of that, the connectivity is implicit. There are structures
where some chains are numbered in reverse order, especially in double helices.
How convenient is it that each base pair has residues of the same
No - they're changing the auth_asym_id. See
https://www.wwpdb.org/documentation/carbohydrate-remediation:
Oligosaccharide molecules are classified as a new entity type, branched,
assigned a unique chain ID (_atom_site.auth_asym_id) and a new mmCIF category
introduced to define the type of
Hi Luca,
Your point remains completely valid and I agree that residues that can belong
to a longer chain should be treated as such. The same problem is with peptide
ligands (at least in PDB times), if they consist of three residues they would
their own chains, with 2 residues they would not.
OK, I understand your point more clearly now - but I'm not sure I fully agree,
for the simple reason that people aren't computers. You're right that for the
purposes of software validation tools the chain IDs are essentially arbitrary -
as long as they're unique, nothing else really matters.
Creating meaning in the chain names "A, B, C, Ag1, Ag2, Ag3" is
exactly the problem. "chain names" ( or "entity identifiers" if I
recall the mmCIF terminology correctly) are simply database "indexes".
The values of indices are meaningless in themselves, they are just
unique values that can
Hi Tristan,
I fully subscribe to your idea! I was quite surprised to see our model revised
with different glycan chain IDs upon PDB annotation. I imagine there must have
been some "administrative" reasoning behind this decision, but it's just a
nightmare for subsequent visualisation. And, to me
This suggestion violates a basic principle of data base theory. A
single data item cannot encode two pieces of information.
I'm sorry if I was unclear, but I don't believe I was suggesting anything of
the sort. Hopefully this example should make it more clear - I'm just
suggesting a slight
Dear Dale and Robbie,
I agree with your comments! But may I stir back the discussion to the original
issue, which is that one-residue N-glycans are now treated differently from
multi-residue N-glycans (although they are both covalently linked to a protein
chain)? This inconsistency is
Dear Dale,Yes, good point. Let's stop bending over backwards to come up with faux PDB compatibility and focus on making mmCIF better.There are struct_conn records that describe the linkages. This is enough to reconstruct the connectivity. There is an ongoing debate on how to capture the restraints
This suggestion violates a basic principle of data base theory. A
single data item cannot encode two pieces of information. The whole
structure of CIF falls apart if this is done.
Does the new PDB convention contain a CIF record of the link that
bridges between the protein chain and
On Fri, 4 Dec 2020 at 09:21, Luca Jovine wrote:
>
> Yes Tristan, that would be even better - also because such an Ag1, Ag2,…
> system could conveniently fall back on a single-character chain A, when
> generating legacy PDB format files from the mmCIF ones.
mmCIF already has two sets of
Hi all,
Can someone point me to cases of glycoprotein structures in the PDB for which the old (traditional?) system of naming N or O linked chain was found inadequate? Thanks.
Stay safe,
Boaz
Boaz Shaanan, Ph.D.
Department of Life Sciences
Ben Gurion University of the Negev
Beer
Hi Tristan and all,
I totally agree that randomly naming the glycan chains is going to give users
headaches. But using more than 2 letters would make the entry incompatible
with the PDB format, which I wish will remain as a download option for as long
as possible.
How about restricting the
Yes Tristan, that would be even better - also because such an Ag1, Ag2,… system
could conveniently fall back on a single-character chain A, when generating
legacy PDB format files from the mmCIF ones.
Exactly for the reason that you pointed out, personally I do not understand the
logic of
To go one step further: in large, heavily glycosylated multi-chain complexes
the assignment of a random new chain ID to each glycan will lead to headaches
for people building visualisations using existing viewers, because it loses the
easy name-based association of glycan to parent protein
We also recently encountered this way of coordinates treatment - did not
find it useful but just left it as it was (life has other more exciting
adventures waiting ...).
I strongly support the solution suggested by Luca,
A nice day to all,
Jan Dohnalek
On Fri, Dec 4, 2020 at 8:48 AM Luca
CC: pdb-l
Dear Zhijie and Robbie,
I agree with both of you that the new carbohydrate chain assignment convention
that has been recently adopted by PDB introduces confusion, not just for
PDB-REDO but also - and especially - for end users.
Could we kindly ask PDB to improve consistency by
Dear Zhijie,
In generally I like the treatment of carbohydrates now as branched polymers. I
didn't realise there was an exception. It makes sense for unlinked carbohydrate
ligands, but not for N- or O-glycosylation sites as these might change during
model building or, in my case, carbohydrate
Hi all,
I was confused when I saw mysterious new glycan chains emerging during PDB
deposition and spent quite some time trying to find out what was wrong with my
coordinates. Then it occurred to me that a lot of recent structures also had
tens of N-glycan chains. Finally I realized that this
32 matches
Mail list logo