Thanks for the prompt response Greg, these are from QM9 dataset (from the original paper). Do you know of any package that has already fixed them by any chance? I used to use Chainer Chemistry to load the dataset, but those seem not to have coordinate information included.
Navid On Wed, Oct 23, 2019 at 10:53 AM Greg Landrum <greg.land...@gmail.com> wrote: > > Given that those molecules are not chemically reasonable, I would suggest > either fixing them by hand or removing them. > > On Wed, 23 Oct 2019 at 16:46, Navid Shervani-Tabar <nshe...@gmail.com> > wrote: > >> Thanks Dan, >> >> There are two more issues after sanitizing: >> >> 1. For some molecules (e.g; c1ccnc1), I get the following error: Can't >> kekulize mol. Unkekulized atoms: 0 1 2 3 4 >> >> 2. For some molecules (e.g; C#CC#CN#N), I get the following error: >> ValueError: Sanitization error: Explicit valence for atom # 4 N, 4, is >> greater than permitted >> >> I asked a similar question few weeks ago, where I got a similar error >> while having SMILES as my input, but non of the suggestions helped. Should >> I just get rid of these molecules? >> >> Thanks, >> Navid >> >> On Tue, Oct 22, 2019 at 11:57 PM Dan Nealschneider < >> dan.nealschnei...@schrodinger.com> wrote: >> >>> Navid- >>> You probably need to "sanitize" the mol: >>> >>> rdkit.Chem.rdmolops.SanitizeMol(mol) >>> >>> *dan nealschneider* | senior developer >>> [image: Schrodinger Logo] <https://www.schrodinger.com/> >>> >>> >>> On Tue, Oct 22, 2019 at 6:31 PM Navid Shervani-Tabar <nshe...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I am trying to load a dataset using a vector of atoms (e.g >>>> [6,6,7,6,6,8]) and the corresponding adjacency matrix. I am using the >>>> following script to transform these into a mol object: >>>> >>>> def MolFromGraphs(node_list, adjacency_matrix): >>>> >>>> # create empty editable mol object >>>> mol = Chem.RWMol() >>>> >>>> # add atoms to mol and keep track of index >>>> node_to_idx = {} >>>> for i in range(len(node_list)): >>>> a = Chem.Atom(node_list[i].item()) >>>> molIdx = mol.AddAtom(a) >>>> node_to_idx[i] = molIdx >>>> >>>> # add bonds between adjacent atoms >>>> for ix, row in enumerate(adjacency_matrix): >>>> for iy, bond in enumerate(row): >>>> >>>> # only traverse half the matrix >>>> if iy <= ix: >>>> continue >>>> >>>> # add relevant bond type (there are many more of these) >>>> if bond == 0: >>>> continue >>>> elif bond == 1: >>>> bond_type = Chem.rdchem.BondType.SINGLE >>>> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type) >>>> elif bond == 2: >>>> bond_type = Chem.rdchem.BondType.DOUBLE >>>> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type) >>>> elif bond == 3: >>>> bond_type = Chem.rdchem.BondType.TRIPLE >>>> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type) >>>> elif bond == 1.5: >>>> bond_type = Chem.rdchem.BondType.AROMATIC >>>> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type) >>>> >>>> # Convert RWMol to Mol object >>>> mol = mol.GetMol() >>>> >>>> return mol >>>> >>>> >>>> When I try to get the hybridization of atoms using the mol object >>>> generated from the function above, I get *UNSPECIFIED.* >>>> >>>> To make sure that this function works, I used *MolToSmiles *to >>>> generate a SMILES string from the generated mol object and it matched the >>>> actual SMILES from the dataset. Interestingly, when I regenerate the mol >>>> object from the SMILES that I already generated from the above function, I >>>> can get the hybridization from the new mol object with no problem. I was >>>> wondering if there is a flag or variable that I should set in the above >>>> function to be able to get hybridization from the generated mol object. >>>> >>>> Thanks! >>>> Navid >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdkit-discuss@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss