Thanks for the prompt response Greg, these are from QM9 dataset (from the
original paper). Do you know of any package that has already fixed them by
any chance? I used to use Chainer Chemistry to load the dataset, but those
seem not to have coordinate information included.

Navid

On Wed, Oct 23, 2019 at 10:53 AM Greg Landrum <greg.land...@gmail.com>
wrote:

>
> Given that those molecules are not chemically reasonable, I would suggest
> either fixing them by hand or removing them.
>
> On Wed, 23 Oct 2019 at 16:46, Navid Shervani-Tabar <nshe...@gmail.com>
> wrote:
>
>> Thanks Dan,
>>
>> There are two more issues after sanitizing:
>>
>> 1. For some molecules (e.g; c1ccnc1), I get the following error: Can't
>> kekulize mol.  Unkekulized atoms: 0 1 2 3 4
>>
>> 2. For some molecules (e.g; C#CC#CN#N), I get the following error:
>> ValueError: Sanitization error: Explicit valence for atom # 4 N, 4, is
>> greater than permitted
>>
>> I asked a similar question few weeks ago, where I got a similar error
>> while having SMILES as my input, but non of the suggestions helped. Should
>> I just get rid of these molecules?
>>
>> Thanks,
>> Navid
>>
>> On Tue, Oct 22, 2019 at 11:57 PM Dan Nealschneider <
>> dan.nealschnei...@schrodinger.com> wrote:
>>
>>> Navid-
>>> You probably need to "sanitize" the mol:
>>>
>>> rdkit.Chem.rdmolops.SanitizeMol(mol)
>>>
>>> *dan nealschneider* | senior developer
>>> [image: Schrodinger Logo] <https://www.schrodinger.com/>
>>>
>>>
>>> On Tue, Oct 22, 2019 at 6:31 PM Navid Shervani-Tabar <nshe...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying to load a dataset using a vector of atoms (e.g
>>>> [6,6,7,6,6,8]) and the corresponding adjacency matrix. I am using the
>>>> following script to transform these into a mol object:
>>>>
>>>> def MolFromGraphs(node_list, adjacency_matrix):
>>>>
>>>>     # create empty editable mol object
>>>>     mol = Chem.RWMol()
>>>>
>>>>     # add atoms to mol and keep track of index
>>>>     node_to_idx = {}
>>>>     for i in range(len(node_list)):
>>>>         a = Chem.Atom(node_list[i].item())
>>>>         molIdx = mol.AddAtom(a)
>>>>         node_to_idx[i] = molIdx
>>>>
>>>>     # add bonds between adjacent atoms
>>>>     for ix, row in enumerate(adjacency_matrix):
>>>>         for iy, bond in enumerate(row):
>>>>
>>>>             # only traverse half the matrix
>>>>             if iy <= ix:
>>>>                 continue
>>>>
>>>>             # add relevant bond type (there are many more of these)
>>>>             if bond == 0:
>>>>                 continue
>>>>             elif bond == 1:
>>>>                 bond_type = Chem.rdchem.BondType.SINGLE
>>>>                 mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
>>>>             elif bond == 2:
>>>>                 bond_type = Chem.rdchem.BondType.DOUBLE
>>>>                 mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
>>>>             elif bond == 3:
>>>>                 bond_type = Chem.rdchem.BondType.TRIPLE
>>>>                 mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
>>>>             elif bond == 1.5:
>>>>                 bond_type = Chem.rdchem.BondType.AROMATIC
>>>>                 mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
>>>>
>>>>     # Convert RWMol to Mol object
>>>>     mol = mol.GetMol()
>>>>
>>>>     return mol
>>>>
>>>>
>>>> When I try to get the hybridization of atoms using the mol object
>>>> generated from the function above, I get *UNSPECIFIED.*
>>>>
>>>> To make sure that this function works, I used *MolToSmiles *to
>>>> generate a SMILES string from the generated mol object and it matched the
>>>> actual SMILES from the dataset. Interestingly, when I regenerate the mol
>>>> object from the SMILES that I already generated from the above function, I
>>>> can get the hybridization from the new mol object with no problem. I was
>>>> wondering if there is a flag or variable that I should set in the above
>>>> function to be able to get hybridization from the generated mol object.
>>>>
>>>> Thanks!
>>>> Navid
>>>>
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to