Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Frances C. Bernstein

The old BNL PDB format documents are available at

https://www.wwpdb.org/documentation/file-format

I looked at the Feb 1992 document to be certain that my memory was 
correct [I am 6 days shy of 82 and am reassured that my recollection 
was,in fact, correct] for information about the representation of 
hydrogen naming as we had to violate the rule of two characters 
right-adjusted for the chemical symbol and two following location 
characters left-adjusted as we would have needed three location 
characters for hydrogens.  I then looked at 2MB5 which was deposited in 
1989 and has hydrogens.  At BNL we would have put, e.g., 1HD1, 2HD1, 
3HD1 for leucine hydrogens and the current version of the entry has 
HD11, HD12, HD13.  And the redone representation of HEM is totally 
confusing to me,


Given the revisions made by the RCSB PDB it makes sense to use the 
element type and not the atom name.


  Frances Bernstein

On 2024-05-15 11:34, Harry Powell wrote:

Hi Robbie

I’m not actually using PDB files of proteins - I’m using the PDB format 
files in PDBeChem, because at the moment I’m interested in doing stuff 
with ligands/substrates/etc. The charges I’ve seen so far seem to be 
not quite what I’d expect, but I’m prepared to work around that.


Harry

On 15 May 2024, at 16:24, Robbie Joosten  
wrote:


Hi Harry,

It might be better now, but there used to positively charged 
aspartates in the PDB. You have a better chance taking charges out of 
the CCD for your atoms of interest. I'm not saying all charges in the 
CCD are correct, but they are much more reliable. If you find errors, 
please report them to the proper authority. See it, say it, sorted.


Cheers,
Robbie

On 15 May 2024 14:41, Harry Powell 
<193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:

Hi

This is very, very useful and hits on the four-letter name problem 
that I am encountering - thank you. Saves me trying to produce a new 
design for a circular object with an axle…


For the files that I am trying to use, columns 77-78 are present 
(actually, columns 79-80 are there so I can read the atomic charge as 
well, which is useful for my purposes) so I’m hoping that this will be 
reliable.


Harry



On 15 May 2024, at 12:38, Marcin Wojdyr  wrote:



• Alignment of one-letter atom name such as C starts at column 14, 
while two-letter atom name such as FE starts at column 13.


indicating a rule does exist.


There are programs that don't read/write the element from columns
77-78, so this rule still matters, but using it is less reliable, as
Robbie wrote. After I wrote a function that reads pdb files for 
gemmi,

over the next few years I received feedback about cases in which the
element columns are absent and the element determined from the atom
name is incorrect. The problem is primarily with 4-character atom
names that can't be aligned, because they use all the four columns
anyway. I added such comments to the code [1] when trying to get it
right:

// Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs 
(almost)

// never have 4-character names, so H is assumed.

// Similarly Deuterium (DXXX), but here alternatives are Dy, Db 
and Ds.
// Only Dysprosium is present in the PDB - in a single entry as 
of 2022.


// Old versions of the PDB format had hydrogen names such as "1HB 
".
// Some MD files use similar names for other elements ("1C4A" -> 
C).


// ... or it can be "C210"

[1] 
https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a 
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are 
available at https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Phil Jeffrey

It says in aforementioned docs:

"Alignment of one-letter atom name such as C starts at column 14, while 
two-letter atom name such as FE starts at column 13.

"
and no, hopefully they don't mean that, since their example shows plenty 
of 2-letter and 3-letter atom names starting at column 14 in the Example 
section below which directly contradicts that statement.


They mean one-letter and two-letter element names, where possible, but 
as previously discussed there are many atom names that don't fit that 
model.  Plus, since they define element names elsewhere they perhaps 
don't want to conflate this data.


PDB's own format definition is both incorrect and confusing.

Sadly I couldn't find PDB format v2 definitions, to see if the 
description changed.


Phil

(Column names starting at 1, I'm having a brief moment of Fortran nostalgia)

On 5/15/24 2:16 PM, Paul Emsley wrote:


On 15/05/2024 18:45, Filipe Maia wrote:

CAUTION: This email originated from outside of the LMB:
*.-owner-ccp...@jiscmail.ac.uk-.*
Do not click links or open attachments unless you recognize the sender 
and know the content is safe.
If you think this is a phishing email, please forward it to 
phish...@mrc-lmb.cam.ac.uk



--

It is, I think you would agree, unconventional to put a CA label for a
main-chain carbon at positions 13 and 14 (I have never seen such a
thing). But is it wrong ("Incorrect" - as Harry labels it)? In this
case, putting "CHA" in positions 13-15 is unconventional (again, I
have
never seen such a thing) - but is it wrong? The official PDB
documentation, according to my reading at least, is not clear.


As Harry pointed out the documentation at 
https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM says in the "Details" that it's incorrect.




Maybe I am being dense, sorry, but could you be more clear about what 
you mean here?


Paul.




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Ian Tickle
'CA' for carbon-alpha is a 2-letter atom name so applying the rule exactly
as stated it should start in column 13; it states that only 1-letter atom
names (C, N, O) go in column 14.  This must be a case of poorly-written
documentation; it must mean 'element symbol' not 'atom name' in those cases
where the 'atom name' begins with the 'element symbol', though as Robbie
points out there's no rule that the atom name must begin with the element
symbol.

COLUMNSDATA TYPEFIELD   DEFINITION

13 - 16Atom  name Atom name.

77 - 78LString(2)element  Element symbol, right-justified.


   - Alignment of one-letter atom name such as C starts at column 14, while
   two-letter atom name such as FE starts at column 13.

I.


On Wed, May 15, 2024 at 11:56 AM Harry Powell <
193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:

> Sorry - just read that
>
> >  • Alignment of one-letter atom name such as C starts at column 14,
> while two-letter atom name such as FE starts at column 13.
>
> indicating a rule does exist.
>
> Harry
>
> > On 15 May 2024, at 11:54, Harry Powell <
> 193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:
> >
> > Hi Ezra
> >
> > Thanks for this.
> >
> > In other words, would it be true to say that there are no actual rules
> about what appears in columns 13-16 because “it's a rose by any other name”?
> >
> > Harry
> >
> >> On 15 May 2024, at 11:38, Ezra Peisach  wrote:
> >>
> >> If you take a look at
> https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM
> >>
> >> you will see the following:
> >>
> >> 77 - 78LString(2)element  Element symbol,
> right-justified.
> >>
> >> Going by atom name will get you in trouble.  As you stated calcium vs
> Calpha.  The element symbol comes from the chemical component dictionary.
> >>
> >>
> >> Ezra
> >>
> >>
> >>
> >> On 5/15/24 6:28 AM, Harry Powell wrote:
> >>> Hi folks
> >>>
> >>> I’m sure that this has been answered many times before (I’m sure that
> when I was young I even read it here…), and I *know* that we should all be
> using mmCIF, but I’m using PDB format files generated by a popular Python
> module and I wanted to check the output against a definitive format
> definition (if that’s not tautology).
> >>>
> >>> I noticed this because I was encouraged to try Moorhen and found that
> a HEM (apparently written by this module) did not have the atoms connected
> with bonds in the display.
> >>>
> >>> I’m particularly interested in metal atoms here, and want to be 100%
> sure that I’ve found a calcium, say, and not a C-alpha.
> >>>
> >>> Q: Is it necessary to check columns 77-78 if I really want to be sure?
> >>>
> >>> I’ve read the following, but can’t see anything obvious in “official”
> PDB documentation that what it says here is actually defined anywhere:
> >>>
>  Atom names are composed of an atomic (element) symbol right-justified
> in columns 13-14, and trailing identifying characters left-justified in
> columns 15-16. A single-character element symbol should not appear in
> column 13 unless the atom name has four characters (for example, see
> Hydrogen Atoms). Many programs simply left-justify all atom names starting
> in column 13. The difference can be seen clearly in a short segment of
> hemoglobin (entry 3hhb):
> 
>  Correct:
>  HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74
>   FE
>  HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74
>C
>  HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92
>C
>  HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00
>C
>  HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25
>C
> 
>  Incorrect:
>  HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74
>   FE
>  HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74
>C
>  HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92
>C
>  HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00
>C
>  HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25
>C
> >>> I’m sure that someone here will say “why don’t you look at *, it’s
> obvious”, in which case - many thanks!
> >>>
> >>> help
> >>>
> >>> Harry
> >>>
> >>>
> 
> >>>
> >>> To unsubscribe from the CCP4BB list, click the following link:
> >>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> >>>
> >>> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
> available at https://www.jiscmail.ac.uk/policyandsecurity/
> >
> > 
> >
> > To unsubscribe from the CCP4BB list, click the following link:
> > 

Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Paul Emsley


On 15/05/2024 18:45, Filipe Maia wrote:

CAUTION: This email originated from outside of the LMB:
*.-owner-ccp...@jiscmail.ac.uk-.*
Do not click links or open attachments unless you recognize the sender 
and know the content is safe.
If you think this is a phishing email, please forward it to 
phish...@mrc-lmb.cam.ac.uk



--

It is, I think you would agree, unconventional to put a CA label for a
main-chain carbon at positions 13 and 14 (I have never seen such a
thing). But is it wrong ("Incorrect" - as Harry labels it)? In this
case, putting "CHA" in positions 13-15 is unconventional (again, I
have
never seen such a thing) - but is it wrong? The official PDB
documentation, according to my reading at least, is not clear.


As Harry pointed out the documentation at 
https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM 
says in the "Details" that it's incorrect.




Maybe I am being dense, sorry, but could you be more clear about what 
you mean here?


Paul.




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Filipe Maia
It is, I think you would agree, unconventional to put a CA label for a
main-chain carbon at positions 13 and 14 (I have never seen such a
thing). But is it wrong ("Incorrect" - as Harry labels it)? In this
case, putting "CHA" in positions 13-15 is unconventional (again, I have
never seen such a thing) - but is it wrong? The official PDB
documentation, according to my reading at least, is not clear.

As Harry pointed out the documentation at 
https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM
 says in the "Details" that it's incorrect.

Cheers,
Filipe








När du har kontakt med oss på Uppsala universitet med e-post så innebär det att 
vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du 
läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For 
more information on how this is performed, please read here: 
http://www.uu.se/en/about-uu/data-protection-policy



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Harry Powell
Hi Robbie

I’m not actually using PDB files of proteins - I’m using the PDB format files 
in PDBeChem, because at the moment I’m interested in doing stuff with 
ligands/substrates/etc. The charges I’ve seen so far seem to be not quite what 
I’d expect, but I’m prepared to work around that.

Harry

> On 15 May 2024, at 16:24, Robbie Joosten  wrote:
> 
> Hi Harry,
> 
> It might be better now, but there used to positively charged aspartates in 
> the PDB. You have a better chance taking charges out of the CCD for your 
> atoms of interest. I'm not saying all charges in the CCD are correct, but 
> they are much more reliable. If you find errors, please report them to the 
> proper authority. See it, say it, sorted.
> 
> Cheers,
> Robbie
> 
> On 15 May 2024 14:41, Harry Powell 
> <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:
> Hi 
> 
> This is very, very useful and hits on the four-letter name problem that I am 
> encountering - thank you. Saves me trying to produce a new design for a 
> circular object with an axle… 
> 
> For the files that I am trying to use, columns 77-78 are present (actually, 
> columns 79-80 are there so I can read the atomic charge as well, which is 
> useful for my purposes) so I’m hoping that this will be reliable. 
> 
> Harry 
> 
> 
>> On 15 May 2024, at 12:38, Marcin Wojdyr  wrote: 
>> 
>>> 
 • Alignment of one-letter atom name such as C starts at column 14, while 
 two-letter atom name such as FE starts at column 13. 
>>> 
>>> indicating a rule does exist. 
>> 
>> There are programs that don't read/write the element from columns 
>> 77-78, so this rule still matters, but using it is less reliable, as 
>> Robbie wrote. After I wrote a function that reads pdb files for gemmi, 
>> over the next few years I received feedback about cases in which the 
>> element columns are absent and the element determined from the atom 
>> name is incorrect. The problem is primarily with 4-character atom 
>> names that can't be aligned, because they use all the four columns 
>> anyway. I added such comments to the code [1] when trying to get it 
>> right: 
>> 
>> // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) 
>> // never have 4-character names, so H is assumed. 
>> 
>> // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds. 
>> // Only Dysprosium is present in the PDB - in a single entry as of 2022. 
>> 
>> // Old versions of the PDB format had hydrogen names such as "1HB ". 
>> // Some MD files use similar names for other elements ("1C4A" -> C). 
>> 
>> // ... or it can be "C210" 
>> 
>> [1] 
>> https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302
>>  
> 
>  
> 
> To unsubscribe from the CCP4BB list, click the following link: 
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/ 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Paul Emsley

On 15/05/2024 11:28, Harry Powell wrote:

Hi folks
[...]

I noticed this because I was encouraged to try Moorhen and found that a HEM 
(apparently written by this module) did not have the atoms connected with bonds 
in the display.

Q: Is it necessary to check columns 77-78 if I really want to be sure?

I’ve read the following, but can’t see anything obvious in “official” PDB 
documentation that what it says here is actually defined anywhere:


Atom names are composed of an atomic (element) symbol right-justified in 
columns 13-14, and trailing identifying characters left-justified in columns 
15-16. A single-character element symbol should not appear in column 13 unless 
the atom name has four characters (for example, see Hydrogen Atoms). Many 
programs simply left-justify all atom names starting in column 13. The 
difference can be seen clearly in a short segment of hemoglobin (entry 3hhb):

Correct:
HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE
HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74   C
HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92   C
HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00   C
HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25   C

Incorrect:
HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE
HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74   C
HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92   C
HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00   C
HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25   C




I have a different slant on this - "is there anything that I need to do 
to fix the parsing of the above file?" - or to put it another way, 
"Who's wrong? Moorhen or this file?"


It is, I think you would agree, unconventional to put a CA label for a 
main-chain carbon at positions 13 and 14 (I have never seen such a 
thing). But is it wrong ("Incorrect" - as Harry labels it)? In this 
case, putting "CHA" in positions 13-15 is unconventional (again, I have 
never seen such a thing) - but is it wrong? The official PDB 
documentation, according to my reading at least, is not clear.


Regards,

Paul.



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Robbie Joosten
Hi Harry,It might be better now, but there used to positively charged aspartates in the PDB. You have a better chance taking charges out of the CCD for your atoms of interest. I'm not saying all charges in the CCD are correct, but they are much more reliable. If you find errors, please report them to the proper authority. See it, say it, sorted.Cheers,RobbieOn 15 May 2024 14:41, Harry Powell <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:Hi

This is very, very useful and hits on the four-letter name problem that I am encountering - thank you. Saves me trying to produce a new design for a circular object with an axle…

For the files that I am trying to use, columns 77-78 are present (actually, columns 79-80 are there so I can read the atomic charge as well, which is useful for my purposes) so I’m hoping that this will be reliable. 

Harry


> On 15 May 2024, at 12:38, Marcin Wojdyr  wrote:
> 
>> 
>>> • Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13.
>> 
>> indicating a rule does exist.
> 
> There are programs that don't read/write the element from columns
> 77-78, so this rule still matters, but using it is less reliable, as
> Robbie wrote. After I wrote a function that reads pdb files for gemmi,
> over the next few years I received feedback about cases in which the
> element columns are absent and the element determined from the atom
> name is incorrect. The problem is primarily with 4-character atom
> names that can't be aligned, because they use all the four columns
> anyway. I added such comments to the code [1] when trying to get it
> right:
> 
>  // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost)
>  // never have 4-character names, so H is assumed.
> 
>  // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds.
>  // Only Dysprosium is present in the PDB - in a single entry as of 2022.
> 
>  // Old versions of the PDB format had hydrogen names such as "1HB ".
>  // Some MD files use similar names for other elements ("1C4A" -> C).
> 
>  // ... or it can be "C210"
> 
> [1] https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Marcin Wojdyr
> It sounds as though you need the power of the script. You can (from memory) 
> run pdbcur to drop the aniso lines and hydrogen atoms, which helps.

Or from command-line:
gemmi convert --anisou=no --remove-h in.pdb out.pdb

> You could probably get it to delete everything except CA's too.

this would be:
gemmi convert --select='CA[C]' --anisou=no --minimal in.pdb out.pdb

(--minimal drops REMARKs and other metadata)



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Martin Malý

Dear Jon,

If I understand your question right, I would use Gemmi for this purpose:
https://gemmi.readthedocs.io/en/latest/mol.html
https://gemmi.readthedocs.io/en/latest/analysis.html

It's not in GUI, it involves scripting in Python. It's a very powerful 
tool and capable of working with both PDB and mmCIF formats and with 
both proteins and nucleic acids.

Cheers,
Martin

On 15/05/2024 13:11, Hughes, Jonathan wrote:

hello CCP4 people,
rather off-topic: is there a purpose-written windows editor for PDF files? with 
interleaved anisotropy lines, missing column delimiters etc., simply extracting 
the B-factors for Ca atoms is hard work using a standard character editor. 
would anyone think of working with DNA without proper tools?
best
jon

--
Prof. Dr. Jon Hughes
Department of Physics
Free University of Berlin
&
Institute for Plant Physiology
Justus Liebig University
Giessen
Germany




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Andy Purkiss
For those wanting a text editor solution, there is the purpose built pdb-mode 
plugin for (x)emacs which works under most operating systems.

The download location has moved around a bit, but a current version is 
available from
https://github.com/mmagnus/emacs-pdb-mode/

with more details at
https://bondxray.org/software/pdb-mode/

Hope this helps,

Andy Purkiss


From: CCP4 bulletin board  on behalf of Hughes, Jonathan 

Sent: 15 May 2024 13:11
To: CCP4BB@JISCMAIL.AC.UK 
Subject: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query


External Sender: Use caution.


hello CCP4 people,
rather off-topic: is there a purpose-written windows editor for PDF files? with 
interleaved anisotropy lines, missing column delimiters etc., simply extracting 
the B-factors for Ca atoms is hard work using a standard character editor. 
would anyone think of working with DNA without proper tools?
best
jon

--
Prof. Dr. Jon Hughes
Department of Physics
Free University of Berlin
&
Institute for Plant Physiology
Justus Liebig University
Giessen
Germany




To unsubscribe from the CCP4BB list, click the following link:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865966750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=VDWcKQcT%2Bs9M9t6kNGAUPF7n6vGIPZdED2GBJByKj74%3D=0<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1>

This message was issued to members of 
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865978077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=CiBNTZ4C160Pp0GpKYayWovGvUlaumjrVAie25SxJSs%3D=0<http://www.jiscmail.ac.uk/CCP4BB>,
 a mailing list hosted by 
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865985978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=GjlgfOKEdgVMc61TcQjKfoDaHt%2BTsK5IQ2MeV8T%2FybE%3D=0<http://www.jiscmail.ac.uk/>,
 terms & conditions are available at 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865992415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=2TI4wk48qzZmZ5RNPGYI5GBXh6YgabucIZKoI8UU9iY%3D=0<https://www.jiscmail.ac.uk/policyandsecurity/>

The Francis Crick Institute Limited is a registered charity in England and 
Wales no. 1140062 and a company registered in England and Wales no. 06885462, 
with its registered office at 1 Midland Road London NW1 1AT



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Harry Powell
Hi

This is very, very useful and hits on the four-letter name problem that I am 
encountering - thank you. Saves me trying to produce a new design for a 
circular object with an axle…

For the files that I am trying to use, columns 77-78 are present (actually, 
columns 79-80 are there so I can read the atomic charge as well, which is 
useful for my purposes) so I’m hoping that this will be reliable. 

Harry


> On 15 May 2024, at 12:38, Marcin Wojdyr  wrote:
> 
>> 
>>> • Alignment of one-letter atom name such as C starts at column 14, while 
>>> two-letter atom name such as FE starts at column 13.
>> 
>> indicating a rule does exist.
> 
> There are programs that don't read/write the element from columns
> 77-78, so this rule still matters, but using it is less reliable, as
> Robbie wrote. After I wrote a function that reads pdb files for gemmi,
> over the next few years I received feedback about cases in which the
> element columns are absent and the element determined from the atom
> name is incorrect. The problem is primarily with 4-character atom
> names that can't be aligned, because they use all the four columns
> anyway. I added such comments to the code [1] when trying to get it
> right:
> 
>  // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost)
>  // never have 4-character names, so H is assumed.
> 
>  // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds.
>  // Only Dysprosium is present in the PDB - in a single entry as of 2022.
> 
>  // Old versions of the PDB format had hydrogen names such as "1HB ".
>  // Some MD files use similar names for other elements ("1C4A" -> C).
> 
>  // ... or it can be "C210"
> 
> [1] 
> https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Jon Cooper
You could probably get it to delete everything except CA's too.

Best wishes, Jon Cooper. jon.b.coo...@protonmail.com

Sent from Proton Mail mobile

 Original Message 
On 15 May 2024, 13:22, Jon Cooper wrote:

> It sounds as though you need the power of the script. You can (from memory) 
> run pdbcur to drop the aniso lines and hydrogen atoms, which helps.
>
> Best wishes, Jon Cooper. jon.b.coo...@protonmail.com
>
> Sent from Proton Mail mobile
>
>  Original Message 
> On 15 May 2024, 13:11, Hughes, Jonathan wrote:
>
>> hello CCP4 people, rather off-topic: is there a purpose-written windows 
>> editor for PDF files? with interleaved anisotropy lines, missing column 
>> delimiters etc., simply extracting the B-factors for Ca atoms is hard work 
>> using a standard character editor. would anyone think of working with DNA 
>> without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics 
>> Free University of Berlin & Institute for Plant Physiology Justus Liebig 
>> University Giessen Germany 
>>  To 
>> unsubscribe from the CCP4BB list, click the following link: 
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This 
>> message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
>> hosted by www.jiscmail.ac.uk, terms & conditions are available at 
>> https://www.jiscmail.ac.uk/policyandsecurity/
>>
>> ---
>>
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Jon Cooper
It sounds as though you need the power of the script. You can (from memory) run 
pdbcur to drop the aniso lines and hydrogen atoms, which helps.

Best wishes, Jon Cooper. jon.b.coo...@protonmail.com

Sent from Proton Mail mobile

 Original Message 
On 15 May 2024, 13:11, Hughes, Jonathan wrote:

> hello CCP4 people, rather off-topic: is there a purpose-written windows 
> editor for PDF files? with interleaved anisotropy lines, missing column 
> delimiters etc., simply extracting the B-factors for Ca atoms is hard work 
> using a standard character editor. would anyone think of working with DNA 
> without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics 
> Free University of Berlin & Institute for Plant Physiology Justus Liebig 
> University Giessen Germany 
>  To 
> unsubscribe from the CCP4BB list, click the following link: 
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message 
> was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by 
> www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Hughes, Jonathan
hello CCP4 people,
rather off-topic: is there a purpose-written windows editor for PDF files? with 
interleaved anisotropy lines, missing column delimiters etc., simply extracting 
the B-factors for Ca atoms is hard work using a standard character editor. 
would anyone think of working with DNA without proper tools? 
best
jon

--
Prof. Dr. Jon Hughes
Department of Physics
Free University of Berlin
&
Institute for Plant Physiology
Justus Liebig University
Giessen
Germany




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Paul Bond
It would also be good to check the monomer library (expanded with any
user-supplied dictionaries). Cases where an element in columns 77-78 exists
and it does not agree with the component definition should probably be
flagged up.

Cheers,
Paul

On Wed, 15 May 2024 at 12:39, Marcin Wojdyr  wrote:

> >
> > >  • Alignment of one-letter atom name such as C starts at column 14,
> while two-letter atom name such as FE starts at column 13.
> >
> > indicating a rule does exist.
>
> There are programs that don't read/write the element from columns
> 77-78, so this rule still matters, but using it is less reliable, as
> Robbie wrote. After I wrote a function that reads pdb files for gemmi,
> over the next few years I received feedback about cases in which the
> element columns are absent and the element determined from the atom
> name is incorrect. The problem is primarily with 4-character atom
> names that can't be aligned, because they use all the four columns
> anyway. I added such comments to the code [1] when trying to get it
> right:
>
>   // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost)
>   // never have 4-character names, so H is assumed.
>
>   // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and
> Ds.
>   // Only Dysprosium is present in the PDB - in a single entry as of
> 2022.
>
>   // Old versions of the PDB format had hydrogen names such as "1HB ".
>   // Some MD files use similar names for other elements ("1C4A" -> C).
>
>   // ... or it can be "C210"
>
> [1]
> https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302
>
> 
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
> mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
> available at https://www.jiscmail.ac.uk/policyandsecurity/
>



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Marcin Wojdyr
>
> >  • Alignment of one-letter atom name such as C starts at column 14, while 
> > two-letter atom name such as FE starts at column 13.
>
> indicating a rule does exist.

There are programs that don't read/write the element from columns
77-78, so this rule still matters, but using it is less reliable, as
Robbie wrote. After I wrote a function that reads pdb files for gemmi,
over the next few years I received feedback about cases in which the
element columns are absent and the element determined from the atom
name is incorrect. The problem is primarily with 4-character atom
names that can't be aligned, because they use all the four columns
anyway. I added such comments to the code [1] when trying to get it
right:

  // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost)
  // never have 4-character names, so H is assumed.

  // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds.
  // Only Dysprosium is present in the PDB - in a single entry as of 2022.

  // Old versions of the PDB format had hydrogen names such as "1HB ".
  // Some MD files use similar names for other elements ("1C4A" -> C).

  // ... or it can be "C210"

[1] 
https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Robbie Joosten
Hi Harry,Deducing the element from the atom name has always been unreliable so since PDB version 3 you have to get it from columns 77-78. There is no implied element in the atom name anymore.HTH,RobbieOn 15 May 2024 12:28, Harry Powell <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:Hi folks



I’m sure that this has been answered many times before (I’m sure that when I was young I even read it here…), and I *know* that we should all be using mmCIF, but I’m using PDB format files generated by a popular Python module and I wanted to check the output against a definitive format definition (if that’s not tautology).



I noticed this because I was encouraged to try Moorhen and found that a HEM (apparently written by this module) did not have the atoms connected with bonds in the display. 



I’m particularly interested in metal atoms here, and want to be 100% sure that I’ve found a calcium, say, and not a C-alpha.



Q: Is it necessary to check columns 77-78 if I really want to be sure?



I’ve read the following, but can’t see anything obvious in “official” PDB documentation that what it says here is actually defined anywhere:



> Atom names are composed of an atomic (element) symbol right-justified in columns 13-14, and trailing identifying characters left-justified in columns 15-16. A single-character element symbol should not appear in column 13 unless the atom name has four characters (for example, see Hydrogen Atoms). Many programs simply left-justify all atom names starting in column 13. The difference can be seen clearly in a short segment of hemoglobin (entry 3hhb):

> 

> Correct:

> HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE

> HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74   C

> HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92   C

> HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00   C

> HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25   C

> 

> Incorrect:

> HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE

> HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74   C

> HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92   C

> HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00   C

> HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25   C



I’m sure that someone here will say “why don’t you look at *, it’s obvious”, in which case - many thanks!



help



Harry







To unsubscribe from the CCP4BB list, click the following link:

https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/





To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Harry Powell
Sorry - just read that 

>  • Alignment of one-letter atom name such as C starts at column 14, while 
> two-letter atom name such as FE starts at column 13.

indicating a rule does exist.

Harry 

> On 15 May 2024, at 11:54, Harry Powell 
> <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:
> 
> Hi Ezra
> 
> Thanks for this. 
> 
> In other words, would it be true to say that there are no actual rules about 
> what appears in columns 13-16 because “it's a rose by any other name”?
> 
> Harry
> 
>> On 15 May 2024, at 11:38, Ezra Peisach  wrote:
>> 
>> If you take a look at 
>> https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM
>> 
>> you will see the following:
>> 
>> 77 - 78LString(2)element  Element symbol, right-justified.
>> 
>> Going by atom name will get you in trouble.  As you stated calcium vs 
>> Calpha.  The element symbol comes from the chemical component dictionary.
>> 
>> 
>> Ezra
>> 
>> 
>> 
>> On 5/15/24 6:28 AM, Harry Powell wrote:
>>> Hi folks
>>> 
>>> I’m sure that this has been answered many times before (I’m sure that when 
>>> I was young I even read it here…), and I *know* that we should all be using 
>>> mmCIF, but I’m using PDB format files generated by a popular Python module 
>>> and I wanted to check the output against a definitive format definition (if 
>>> that’s not tautology).
>>> 
>>> I noticed this because I was encouraged to try Moorhen and found that a HEM 
>>> (apparently written by this module) did not have the atoms connected with 
>>> bonds in the display.
>>> 
>>> I’m particularly interested in metal atoms here, and want to be 100% sure 
>>> that I’ve found a calcium, say, and not a C-alpha.
>>> 
>>> Q: Is it necessary to check columns 77-78 if I really want to be sure?
>>> 
>>> I’ve read the following, but can’t see anything obvious in “official” PDB 
>>> documentation that what it says here is actually defined anywhere:
>>> 
 Atom names are composed of an atomic (element) symbol right-justified in 
 columns 13-14, and trailing identifying characters left-justified in 
 columns 15-16. A single-character element symbol should not appear in 
 column 13 unless the atom name has four characters (for example, see 
 Hydrogen Atoms). Many programs simply left-justify all atom names starting 
 in column 13. The difference can be seen clearly in a short segment of 
 hemoglobin (entry 3hhb):
 
 Correct:
 HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74 
  FE
 HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74 
   C
 HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92 
   C
 HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00 
   C
 HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25 
   C
 
 Incorrect:
 HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74 
  FE
 HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74 
   C
 HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92 
   C
 HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00 
   C
 HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25 
   C
>>> I’m sure that someone here will say “why don’t you look at *, it’s 
>>> obvious”, in which case - many thanks!
>>> 
>>> help
>>> 
>>> Harry
>>> 
>>> 
>>> 
>>> To unsubscribe from the CCP4BB list, click the following link:
>>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>>> 
>>> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
>>> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
>>> https://www.jiscmail.ac.uk/policyandsecurity/
> 
> 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 
> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
> https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Harry Powell
Hi Ezra

Thanks for this. 

In other words, would it be true to say that there are no actual rules about 
what appears in columns 13-16 because “it's a rose by any other name”?

Harry

> On 15 May 2024, at 11:38, Ezra Peisach  wrote:
> 
> If you take a look at 
> https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM
> 
> you will see the following:
> 
> 77 - 78LString(2)element  Element symbol, right-justified.
> 
> Going by atom name will get you in trouble.  As you stated calcium vs Calpha. 
>  The element symbol comes from the chemical component dictionary.
> 
> 
> Ezra
> 
> 
> 
> On 5/15/24 6:28 AM, Harry Powell wrote:
>> Hi folks
>> 
>> I’m sure that this has been answered many times before (I’m sure that when I 
>> was young I even read it here…), and I *know* that we should all be using 
>> mmCIF, but I’m using PDB format files generated by a popular Python module 
>> and I wanted to check the output against a definitive format definition (if 
>> that’s not tautology).
>> 
>> I noticed this because I was encouraged to try Moorhen and found that a HEM 
>> (apparently written by this module) did not have the atoms connected with 
>> bonds in the display.
>> 
>> I’m particularly interested in metal atoms here, and want to be 100% sure 
>> that I’ve found a calcium, say, and not a C-alpha.
>> 
>> Q: Is it necessary to check columns 77-78 if I really want to be sure?
>> 
>> I’ve read the following, but can’t see anything obvious in “official” PDB 
>> documentation that what it says here is actually defined anywhere:
>> 
>>> Atom names are composed of an atomic (element) symbol right-justified in 
>>> columns 13-14, and trailing identifying characters left-justified in 
>>> columns 15-16. A single-character element symbol should not appear in 
>>> column 13 unless the atom name has four characters (for example, see 
>>> Hydrogen Atoms). Many programs simply left-justify all atom names starting 
>>> in column 13. The difference can be seen clearly in a short segment of 
>>> hemoglobin (entry 3hhb):
>>> 
>>> Correct:
>>> HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  
>>> FE
>>> HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74  
>>>  C
>>> HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92  
>>>  C
>>> HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00  
>>>  C
>>> HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25  
>>>  C
>>> 
>>> Incorrect:
>>> HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  
>>> FE
>>> HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74  
>>>  C
>>> HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92  
>>>  C
>>> HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00  
>>>  C
>>> HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25  
>>>  C
>> I’m sure that someone here will say “why don’t you look at *, it’s 
>> obvious”, in which case - many thanks!
>> 
>> help
>> 
>> Harry
>> 
>> 
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>> 
>> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing 
>> list hosted by www.jiscmail.ac.uk, terms & conditions are available at 
>> https://www.jiscmail.ac.uk/policyandsecurity/



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Ezra Peisach
If you take a look at 
https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM


you will see the following:

77 - 78    LString(2)    element  Element symbol, right-justified.

Going by atom name will get you in trouble.  As you stated calcium vs 
Calpha.  The element symbol comes from the chemical component dictionary.



Ezra



On 5/15/24 6:28 AM, Harry Powell wrote:

Hi folks

I’m sure that this has been answered many times before (I’m sure that when I 
was young I even read it here…), and I *know* that we should all be using 
mmCIF, but I’m using PDB format files generated by a popular Python module and 
I wanted to check the output against a definitive format definition (if that’s 
not tautology).

I noticed this because I was encouraged to try Moorhen and found that a HEM 
(apparently written by this module) did not have the atoms connected with bonds 
in the display.

I’m particularly interested in metal atoms here, and want to be 100% sure that 
I’ve found a calcium, say, and not a C-alpha.

Q: Is it necessary to check columns 77-78 if I really want to be sure?

I’ve read the following, but can’t see anything obvious in “official” PDB 
documentation that what it says here is actually defined anywhere:


Atom names are composed of an atomic (element) symbol right-justified in 
columns 13-14, and trailing identifying characters left-justified in columns 
15-16. A single-character element symbol should not appear in column 13 unless 
the atom name has four characters (for example, see Hydrogen Atoms). Many 
programs simply left-justify all atom names starting in column 13. The 
difference can be seen clearly in a short segment of hemoglobin (entry 3hhb):

Correct:
HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE
HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74   C
HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92   C
HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00   C
HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25   C

Incorrect:
HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE
HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74   C
HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92   C
HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00   C
HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25   C

I’m sure that someone here will say “why don’t you look at *, it’s 
obvious”, in which case - many thanks!

help

Harry



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] a dinosaur asks ... PDB format query

2024-05-15 Thread Harry Powell
Hi folks

I’m sure that this has been answered many times before (I’m sure that when I 
was young I even read it here…), and I *know* that we should all be using 
mmCIF, but I’m using PDB format files generated by a popular Python module and 
I wanted to check the output against a definitive format definition (if that’s 
not tautology).

I noticed this because I was encouraged to try Moorhen and found that a HEM 
(apparently written by this module) did not have the atoms connected with bonds 
in the display. 

I’m particularly interested in metal atoms here, and want to be 100% sure that 
I’ve found a calcium, say, and not a C-alpha.

Q: Is it necessary to check columns 77-78 if I really want to be sure?

I’ve read the following, but can’t see anything obvious in “official” PDB 
documentation that what it says here is actually defined anywhere:

> Atom names are composed of an atomic (element) symbol right-justified in 
> columns 13-14, and trailing identifying characters left-justified in columns 
> 15-16. A single-character element symbol should not appear in column 13 
> unless the atom name has four characters (for example, see Hydrogen Atoms). 
> Many programs simply left-justify all atom names starting in column 13. The 
> difference can be seen clearly in a short segment of hemoglobin (entry 3hhb):
> 
> Correct:
> HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE
> HETATM 1072  CHA HEM A   1   8.617   7.879 -18.361  6.00 17.74   C
> HETATM 1073  CHB HEM A   1  10.356  10.005 -14.319  6.00 18.92   C
> HETATM 1074  CHC HEM A   1   8.307   6.456 -11.669  6.00 11.00   C
> HETATM 1075  CHD HEM A   1   6.928   4.145 -15.725  6.00 13.25   C
> 
> Incorrect:
> HETATM 1071 FE   HEM A   1   8.128   7.371 -15.022 24.00 16.74  FE
> HETATM 1072 CHA  HEM A   1   8.617   7.879 -18.361  6.00 17.74   C
> HETATM 1073 CHB  HEM A   1  10.356  10.005 -14.319  6.00 18.92   C
> HETATM 1074 CHC  HEM A   1   8.307   6.456 -11.669  6.00 11.00   C
> HETATM 1075 CHD  HEM A   1   6.928   4.145 -15.725  6.00 13.25   C

I’m sure that someone here will say “why don’t you look at *, it’s 
obvious”, in which case - many thanks!

help

Harry



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/