On Fri, Jan 4, 2019 at 1:59 PM Jason Ochoada <jocho...@gmail.com> wrote:
> > Thanks so much for taking the time to help! I didn't realize the size > limit recommendation for pandas so maybe that's why I don't see much of it. > Yeah, Pandas is designed to keep the entire dataframe in memory. This makes things tricky with large datasets. For similarity searches across large sets, chemfp is really a great way to go (and it's even better if you license the commercial version). > I often work on much larger scale and was investigating moving from > KNIME to RDKit on Linux for that reason. The curve is just steep right now > :) learning python, pandas, RDKit etc. all at once! I'll start > digging/searching for the more traditional straight python ways to do the > same. > Yeah, there's a lot there to pick up, but it sounds like you're making a great start... good luck with it and please do keep asking questions as you encounter problems! -greg > Thanks again for the info and help! > Jason > St. Jude Children's Research Hospital > > On Fri, Jan 4, 2019 at 1:34 AM Greg Landrum <greg.land...@gmail.com> > wrote: > >> Hi Jason, >> >> This gist shows how to generate fingerprints for the molecules in a >> pandas dataframe and then use them to do similarity searches: >> https://gist.github.com/greglandrum/045ccf8009fde91fc985864e70ee72a1 >> >> This is a reasonably efficient way of working with a smallish (<10K) >> number of molecules. >> >> -greg >> >> >> On Thu, Jan 3, 2019 at 7:10 PM Jason Ochoada <jocho...@gmail.com> wrote: >> >>> Hi Everyone! >>> >>> I'm a newbie making the shift from RDKit in KNIME to working with the >>> full package. I have been working (hacking) my through the tutorials I >>> could find pandas, Jupyter, RDKit etc. I'm using RDKit in the anaconda 3 >>> environment. I'm struggling to figure out how to do what I imagine is a >>> very simple task. I have read in a flat file (Smiles file) and have it in >>> a pandas data frame named cpds. It contained SMILES and ID. I have been >>> able to add a molecule to the dataframe: >>> >>> >>> PandasTools.AddMoleculeColumnToFrame(cpds,'SMILES','Molecule',includeFingerprints=False) >>> print([str(x) for x in cpds.columns]) >>> >>> But I can't seem to figure out how to create and append a fingerprint. >>> I'm open to any options as I'm new and don't have any particular structure >>> I like to work in. Of course once I have this I'd like to do similarity >>> searches either in RDKit or chemfp etc. someday. >>> >>> Can you point me to where this might have been done? I've searched and >>> searched but I can't seem to find a solution that will work for me. >>> >>> Thanks, >>> Jason Ochoada >>> St. Jude Children's Research Hospital >>> >> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss