Re: [Rdkit-discuss] Generating Fingerprints from Smiles or Mol

Greg Landrum Fri, 04 Jan 2019 23:23:17 -0800

On Fri, Jan 4, 2019 at 1:59 PM Jason Ochoada <jocho...@gmail.com> wrote:


>
> Thanks so much for taking the time to help!  I didn't realize the size
> limit recommendation for pandas so maybe that's why I don't see much of it.
>

Yeah, Pandas is designed to keep the entire dataframe in memory. This makes
things tricky with large datasets.
For similarity searches across large sets, chemfp is really a great way to
go (and it's even better if you license the commercial version).


>   I often work on much larger scale and was investigating moving from
> KNIME to RDKit on Linux for that reason.  The curve is just steep right now
> :) learning python, pandas, RDKit etc. all at once!  I'll start
> digging/searching for the more traditional straight python ways to do the
> same.
>

Yeah, there's a lot there to pick up, but it sounds like you're making a
great start... good luck with it and please do keep asking questions as you
encounter problems!

-greg



> Thanks again for the info and help!
> Jason
> St. Jude Children's Research Hospital
>
> On Fri, Jan 4, 2019 at 1:34 AM Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Hi Jason,
>>
>> This gist shows how to generate fingerprints for the molecules in a
>> pandas dataframe and then use them to do similarity searches:
>> https://gist.github.com/greglandrum/045ccf8009fde91fc985864e70ee72a1
>>
>> This is a reasonably efficient way of working with a smallish (<10K)
>> number of molecules.
>>
>> -greg
>>
>>
>> On Thu, Jan 3, 2019 at 7:10 PM Jason Ochoada <jocho...@gmail.com> wrote:
>>
>>> Hi Everyone!
>>>
>>> I'm a newbie making the shift from RDKit in KNIME to working with the
>>> full package.  I have been working (hacking) my through the tutorials I
>>> could find pandas, Jupyter, RDKit etc.  I'm using RDKit in the anaconda 3
>>> environment.  I'm struggling to figure out how to do what I imagine is a
>>> very simple task.  I have read in a flat file (Smiles file) and have it in
>>> a pandas data frame named cpds.  It contained SMILES and ID.  I have been
>>> able to add a molecule to the dataframe:
>>>
>>>
>>> PandasTools.AddMoleculeColumnToFrame(cpds,'SMILES','Molecule',includeFingerprints=False)
>>> print([str(x) for x in cpds.columns])
>>>
>>> But I can't seem to figure out how to create and append a fingerprint.
>>> I'm open to any options as I'm new and don't have any particular structure
>>> I like to work in.  Of course once I have this I'd like to do similarity
>>> searches either in RDKit or chemfp etc. someday.
>>>
>>> Can you point me to where this might have been done?  I've searched and
>>> searched but I can't seem to find a solution that will work for me.
>>>
>>> Thanks,
>>> Jason Ochoada
>>> St. Jude Children's Research Hospital
>>>
>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Generating Fingerprints from Smiles or Mol

Reply via email to