Dear Paolo, Great makes sense, thanks so much for the explanation and solution!
Susan On Wed, Dec 9, 2020 at 9:21 PM Paolo Tosco <paolo.tosco.m...@gmail.com> wrote: > Dear Susan, > > the reason is that PDBAtomLine() ignores records where the alternate > location is different from ' ', 'A' or '1': > > > https://github.com/rdkit/rdkit/blob/e7e17adc4ef822d2663fa6e1ba5b978512c7a8b4/Code/GraphMol/FileParsers/PDBParser.cpp#L62 > > I have run myself in the past into PDB files which have a single alternate > location which is none of the above (e.g., only 'B'). > To avoid the problem you may set flavor=1 in MolFromPDBBlock(): > > m_altlocB = Chem.MolFromPDBBlock(""" > HETATM 7886 C1 BG5K C 702 -0.945 12.634 14.174 0.51 48.52 > C > HETATM 7887 C2 BG5K C 702 -0.880 12.457 12.854 0.51 49.59 > C > HETATM 7888 C3 BG5K C 702 -2.175 12.105 12.250 0.51 50.84 > C > HETATM 7889 O4 BG5K C 702 -3.162 12.016 12.973 0.51 56.22 > O > HETATM 7890 N5 BG5K C 702 -2.196 11.880 10.884 0.51 46.36 > N > HETATM 7891 C6 BG5K C 702 -3.381 11.554 10.123 0.51 42.61 > C > HETATM 7892 C7 BG5K C 702 -2.994 11.820 8.683 0.51 34.05 > C > HETATM 7893 C8 BG5K C 702 -1.767 12.664 8.829 0.51 37.06 > C > HETATM 7894 C9 BG5K C 702 -1.088 12.006 9.994 0.51 39.00 > C > HETATM 7895 N10BG5K C 702 -0.970 12.778 7.632 0.51 32.64 > N > HETATM 7896 C11BG5K C 702 -0.892 13.901 6.897 0.51 31.98 > C > HETATM 7897 N12BG5K C 702 0.291 14.135 6.316 0.51 32.89 > N > HETATM 7898 C13BG5K C 702 0.365 15.236 5.562 0.51 37.92 > C > HETATM 7899 C14BG5K C 702 1.667 15.577 4.898 0.51 45.61 > C > HETATM 7900 N15BG5K C 702 2.013 14.870 3.674 0.51 54.36 > N > HETATM 7901 C16BG5K C 702 2.736 13.630 3.932 0.51 61.64 > C > HETATM 7902 C17BG5K C 702 3.129 13.030 2.594 0.51 67.85 > C > HETATM 7903 O18BG5K C 702 1.993 12.804 1.757 0.51 70.80 > O > HETATM 7904 C19BG5K C 702 1.368 14.060 1.475 0.51 61.74 > C > HETATM 7905 C20BG5K C 702 0.884 14.647 2.784 0.51 59.26 > C > HETATM 7906 C21BG5K C 702 -0.704 16.091 5.410 0.51 30.62 > C > HETATM 7907 C22BG5K C 702 -1.867 15.765 6.060 0.51 27.18 > C > HETATM 7908 N23BG5K C 702 -3.002 16.585 5.957 0.51 24.96 > N > HETATM 7909 C24BG5K C 702 -4.148 16.590 6.655 0.51 25.09 > C > HETATM 7910 N25BG5K C 702 -5.074 17.504 6.513 0.51 23.42 > N > HETATM 7911 C26BG5K C 702 -6.122 17.226 7.351 0.51 23.35 > C > HETATM 7912 C27BG5K C 702 -7.284 17.952 7.501 0.51 24.35 > C > HETATM 7913 C28BG5K C 702 -8.201 17.482 8.421 0.51 25.73 > C > HETATM 7914 C29BG5K C 702 -7.973 16.341 9.153 0.51 25.63 > C > HETATM 7915 N30BG5K C 702 -6.855 15.637 9.017 0.51 29.00 > N > HETATM 7916 C31BG5K C 702 -5.966 16.100 8.129 0.51 25.96 > C > HETATM 7917 S32BG5K C 702 -4.462 15.337 7.792 0.51 27.85 > S > HETATM 7918 N33BG5K C 702 -2.005 14.665 6.815 0.51 30.08 > N > """, flavor=1) > Chem.MolToSmiles(m_altlocB) > > 'CC(O)N1CC[C@H](NC2NC(CN3CCOCC3)CC(NC3NC4CCCNC4S3)N2)C1' > > > Cheers, > p. > > On Wed, Dec 9, 2020 at 6:07 PM Susan Leung <susanhle...@gmail.com> wrote: > >> Hi all! >> >> I'm having problems reading in a PDB file with altloc B.... See below >> code that tries to read in residue 702, chain C in 4kio.pdb ( >> https://files.rcsb.org/view/4KIO.pdb) . A mol object is returned but it >> hasn’t got anything in it (or at least it returns an empty string when I do >> Chem.MolFromSmiles() on it.) >> >> In [1]: import rdkit >> >> In [2]: from rdkit import Chem >> >> In [3]: rdkit.__version__ >> >> Out[3]: '2019.03.2' >> >> In [4]: m_altlocB = Chem.MolFromPDBBlock(""" >> >> ...: HETATM 7886 C1 BG5K C 702 -0.945 12.634 14.174 0.51 >> 48.52 C >> >> ...: HETATM 7887 C2 BG5K C 702 -0.880 12.457 12.854 0.51 >> 49.59 C >> >> ...: HETATM 7888 C3 BG5K C 702 -2.175 12.105 12.250 0.51 >> 50.84 C >> >> ...: HETATM 7889 O4 BG5K C 702 -3.162 12.016 12.973 0.51 >> 56.22 O >> >> ...: HETATM 7890 N5 BG5K C 702 -2.196 11.880 10.884 0.51 >> 46.36 N >> >> ...: HETATM 7891 C6 BG5K C 702 -3.381 11.554 10.123 0.51 >> 42.61 C >> >> ...: HETATM 7892 C7 BG5K C 702 -2.994 11.820 8.683 0.51 >> 34.05 C >> >> ...: HETATM 7893 C8 BG5K C 702 -1.767 12.664 8.829 0.51 >> 37.06 C >> >> ...: HETATM 7894 C9 BG5K C 702 -1.088 12.006 9.994 0.51 >> 39.00 C >> >> ...: HETATM 7895 N10BG5K C 702 -0.970 12.778 7.632 0.51 >> 32.64 N >> >> ...: HETATM 7896 C11BG5K C 702 -0.892 13.901 6.897 0.51 >> 31.98 C >> >> ...: HETATM 7897 N12BG5K C 702 0.291 14.135 6.316 0.51 >> 32.89 N >> >> ...: HETATM 7898 C13BG5K C 702 0.365 15.236 5.562 0.51 >> 37.92 C >> >> ...: HETATM 7899 C14BG5K C 702 1.667 15.577 4.898 0.51 >> 45.61 C >> >> ...: HETATM 7900 N15BG5K C 702 2.013 14.870 3.674 0.51 >> 54.36 N >> >> ...: HETATM 7901 C16BG5K C 702 2.736 13.630 3.932 0.51 >> 61.64 C >> >> ...: HETATM 7902 C17BG5K C 702 3.129 13.030 2.594 0.51 >> 67.85 C >> >> ...: HETATM 7903 O18BG5K C 702 1.993 12.804 1.757 0.51 >> 70.80 O >> >> ...: HETATM 7904 C19BG5K C 702 1.368 14.060 1.475 0.51 >> 61.74 C >> >> ...: HETATM 7905 C20BG5K C 702 0.884 14.647 2.784 0.51 >> 59.26 C >> >> ...: HETATM 7906 C21BG5K C 702 -0.704 16.091 5.410 0.51 >> 30.62 C >> >> ...: HETATM 7907 C22BG5K C 702 -1.867 15.765 6.060 0.51 >> 27.18 C >> >> ...: HETATM 7908 N23BG5K C 702 -3.002 16.585 5.957 0.51 >> 24.96 N >> >> ...: HETATM 7909 C24BG5K C 702 -4.148 16.590 6.655 0.51 >> 25.09 C >> >> ...: HETATM 7910 N25BG5K C 702 -5.074 17.504 6.513 0.51 >> 23.42 N >> >> ...: HETATM 7911 C26BG5K C 702 -6.122 17.226 7.351 0.51 >> 23.35 C >> >> ...: HETATM 7912 C27BG5K C 702 -7.284 17.952 7.501 0.51 >> 24.35 C >> >> ...: HETATM 7913 C28BG5K C 702 -8.201 17.482 8.421 0.51 >> 25.73 C >> >> ...: HETATM 7914 C29BG5K C 702 -7.973 16.341 9.153 0.51 >> 25.63 C >> >> ...: HETATM 7915 N30BG5K C 702 -6.855 15.637 9.017 0.51 >> 29.00 N >> >> ...: HETATM 7916 C31BG5K C 702 -5.966 16.100 8.129 0.51 >> 25.96 C >> >> ...: HETATM 7917 S32BG5K C 702 -4.462 15.337 7.792 0.51 >> 27.85 S >> >> ...: HETATM 7918 N33BG5K C 702 -2.005 14.665 6.815 0.51 >> 30.08 N >> >> ...: """) >> >> In [5]: Chem.MolToSmiles(m_altlocB) >> >> Out[5]: '' >> >> But if I change the altloc column which is column 17 from B to A then it >> reads in and prints the SMILES fine. >> >> In [6]: m_altlocA = Chem.MolFromPDBBlock(""" >> >> ...: HETATM 7886 C1 AG5K C 702 -0.945 12.634 14.174 0.51 >> 48.52 C >> >> ...: HETATM 7887 C2 AG5K C 702 -0.880 12.457 12.854 0.51 >> 49.59 C >> >> ...: HETATM 7888 C3 AG5K C 702 -2.175 12.105 12.250 0.51 >> 50.84 C >> >> ...: HETATM 7889 O4 AG5K C 702 -3.162 12.016 12.973 0.51 >> 56.22 O >> >> ...: HETATM 7890 N5 AG5K C 702 -2.196 11.880 10.884 0.51 >> 46.36 N >> >> ...: HETATM 7891 C6 AG5K C 702 -3.381 11.554 10.123 0.51 >> 42.61 C >> >> ...: HETATM 7892 C7 AG5K C 702 -2.994 11.820 8.683 0.51 >> 34.05 C >> >> ...: HETATM 7893 C8 AG5K C 702 -1.767 12.664 8.829 0.51 >> 37.06 C >> >> ...: HETATM 7894 C9 AG5K C 702 -1.088 12.006 9.994 0.51 >> 39.00 C >> >> ...: HETATM 7895 N10AG5K C 702 -0.970 12.778 7.632 0.51 >> 32.64 N >> >> ...: HETATM 7896 C11AG5K C 702 -0.892 13.901 6.897 0.51 >> 31.98 C >> >> ...: HETATM 7897 N12AG5K C 702 0.291 14.135 6.316 0.51 >> 32.89 N >> >> ...: HETATM 7898 C13AG5K C 702 0.365 15.236 5.562 0.51 >> 37.92 C >> >> ...: HETATM 7899 C14AG5K C 702 1.667 15.577 4.898 0.51 >> 45.61 C >> >> ...: HETATM 7900 N15AG5K C 702 2.013 14.870 3.674 0.51 >> 54.36 N >> >> ...: HETATM 7901 C16AG5K C 702 2.736 13.630 3.932 0.51 >> 61.64 C >> >> ...: HETATM 7902 C17AG5K C 702 3.129 13.030 2.594 0.51 >> 67.85 C >> >> ...: HETATM 7903 O18AG5K C 702 1.993 12.804 1.757 0.51 >> 70.80 O >> >> ...: HETATM 7904 C19AG5K C 702 1.368 14.060 1.475 0.51 >> 61.74 C >> >> ...: HETATM 7905 C20AG5K C 702 0.884 14.647 2.784 0.51 >> 59.26 C >> >> ...: HETATM 7906 C21AG5K C 702 -0.704 16.091 5.410 0.51 >> 30.62 C >> >> ...: HETATM 7907 C22AG5K C 702 -1.867 15.765 6.060 0.51 >> 27.18 C >> >> ...: HETATM 7908 N23AG5K C 702 -3.002 16.585 5.957 0.51 >> 24.96 N >> >> ...: HETATM 7909 C24AG5K C 702 -4.148 16.590 6.655 0.51 >> 25.09 C >> >> ...: HETATM 7910 N25AG5K C 702 -5.074 17.504 6.513 0.51 >> 23.42 N >> >> ...: HETATM 7911 C26AG5K C 702 -6.122 17.226 7.351 0.51 >> 23.35 C >> >> ...: HETATM 7912 C27AG5K C 702 -7.284 17.952 7.501 0.51 >> 24.35 C >> >> ...: HETATM 7913 C28AG5K C 702 -8.201 17.482 8.421 0.51 >> 25.73 C >> >> ...: HETATM 7914 C29AG5K C 702 -7.973 16.341 9.153 0.51 >> 25.63 C >> >> ...: HETATM 7915 N30AG5K C 702 -6.855 15.637 9.017 0.51 >> 29.00 N >> >> ...: HETATM 7916 C31AG5K C 702 -5.966 16.100 8.129 0.51 >> 25.96 C >> >> ...: HETATM 7917 S32AG5K C 702 -4.462 15.337 7.792 0.51 >> 27.85 S >> >> ...: HETATM 7918 N33AG5K C 702 -2.005 14.665 6.815 0.51 >> 30.08 N >> >> ...: """) >> >> In [7]: Chem.MolToSmiles(m_altlocA) >> >> Out[7]: 'CC(O)N1CC[C@H](NC2NC(CN3CCOCC3)CC(NC3NC4CCCNC4S3)N2)C1' >> >> Am I doing something wrong? I also attach this code as a .ipynb. >> >> Many thanks in advance! >> >> Susan >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss