Re: [Rdkit-discuss] Pandas to Excel

2024-02-22 Thread Chris Swain via Rdkit-discuss
Hi Both, Many thanks for your rapid response, much appreciated. Cheers Chris ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Pandas to Excel

2024-02-22 Thread Taka Seri
Hi Chris, I think you can do it with SaveXlsxFromFrame. http://rdkit.org/docs/source/rdkit.Chem.PandasTools.html#SaveXlsxFromFrame rdkit.Chem.PandasTools.SaveXlsxFromFrame(*frame*, *outFile*, *molCol='ROMol'*, *size=(300, 300)*, *formats=None*)¶

Re: [Rdkit-discuss] Pandas

2016-11-26 Thread Chris Swain
Search and add similarity to resulting data frame > On 27 Nov 2016, at 07:55, Greg Landrum wrote: > > > You don't know if what could be done as a single line? > > -greg --

Re: [Rdkit-discuss] Pandas

2016-11-26 Thread Greg Landrum
On Sun, Nov 27, 2016 at 8:45 AM, Chris Swain wrote: > I added the similarity scores > by adding an extra line, > > sdf['sim']=DataStructs.BulkTanimotoSimilarity(ionised_fps,sdf['mfp2’]) > Yes, that's what I would have done. > I don’t know if it could be done in a single line? > You don't know

Re: [Rdkit-discuss] Pandas

2016-11-26 Thread Chris Swain
I added the similarity scores by adding an extra line, sdf['sim']=DataStructs.BulkTanimotoSimilarity(ionised_fps,sdf['mfp2’]) I don’t know if it could be done in a single line? Chris > On 26 Nov 2016, at 04:48, Greg Landrum wrote: > > That's a good question. > > I'm not a master of pandas

Re: [Rdkit-discuss] Pandas

2016-11-26 Thread Chris Swain
This works very nicely, would it be nice to add the similarity scores to the resulting data frame. Cheers, Chris > On 26 Nov 2016, at 04:48, Greg Landrum wrote: > > That's a good question. > > I'm not a master of pandas indexing, but this seems to work: > In [5]: sdf['mfp2'] = [rdMolDescript

Re: [Rdkit-discuss] Pandas

2016-11-25 Thread Greg Landrum
That's a good question. I'm not a master of pandas indexing, but this seems to work: In [5]: sdf['mfp2'] = [rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2) for x in sdf['ROMol']] In [8]: sims = DataStructs.BulkTanimotoSimilarity(qry,sdf['mfp2']) In [13]: ids = [x for x,y in enumerate(sims) if

Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Brian Kelley
Peter, If you have chemfp and can make a chemfp arena, RDKit now supports these structures for reading and searching. This, by far, is the fastest way I know of similarity searching. I believe that Greg's implementation is compatible with chemfp 1.0 which is available on pypi: https://pypi.pyt

Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Peter Gedeck
Is it possible to use the bulk similarity searching functionality for better performance instead of the list comprehension? Best, Peter On Wed, Nov 23, 2016 at 9:11 AM Greg Landrum wrote: No worries. This, and Anna's question about similarity searching and clustering illustrate a great opport

Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Greg Landrum
No worries.This, and Anna's question about similarity searching and clustering illustrate a great opportunity for a tutorial on fingerprints and similarity searching.  -greg On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain" wrote: Thanks for this, As a chemist who comes from t

Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Chris Swain
Thanks for this, As a chemist who comes from the “cut and paste” school of scripting I’m always concerned I’m asking something blindingly obvious ;-) Chris > On 23 Nov 2016, at 12:36, Greg Landrum wrote: > > [including rdkit-discuss, because it's relevant there and I'm pretty sure > Chris wo

Re: [Rdkit-discuss] Pandas

2016-11-23 Thread Greg Landrum
[including rdkit-discuss, because it's relevant there and I'm pretty sure Chris won't mind and the real Pandas experts may have a better answer than me.] On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain wrote: > > I quite like storing molecules and associated data in a data frame and > I’ve see that

Re: [Rdkit-discuss] Pandas dataframe manipulation

2016-03-11 Thread Paul Czodrowski
[0] == '>' else x}).dropna(axis=0) Paul Von: Maciek Wójcikowski [mailto:mac...@wojcikowski.pl] Gesendet: Freitag, 11. März 2016 12:29 An: Paul Czodrowski Cc: rdkit Betreff: Re: [Rdkit-discuss] Pandas dataframe manipulation Hi Paul, I would suggest: * assigning dtype of dataframe

Re: [Rdkit-discuss] Pandas dataframe manipulation

2016-03-11 Thread Maciek Wójcikowski
Hi Paul, I would suggest: - assigning dtype of dataframe/column to str/np.object - cleaning up the IC50s - casting to float/int as dataframe.astype() Or alternatively you could use "converters" argument: pd.read_csv('filename.csv', converters={'ic50_colname': lambda x: x.replace('>', ''

Re: [Rdkit-discuss] pandas / sd-tags

2013-07-02 Thread Nikolas Fechner
Hi Paul, I'll answer directly below. > Dear Niko, > > I was exactly looking for this functionality, great work! > > A few follow-up questions: > * frame.set_index('_Name') did not work, but there is a name set in the SD > file. The molecule name is contained in the column specified by the opti

Re: [Rdkit-discuss] pandas / sd-tags

2013-07-01 Thread Paul . Czodrowski
Dear Niko, I was exactly looking for this functionality, great work! A few follow-up questions: * frame.set_index('_Name') did not work, but there is a name set in the SD file. * Is there a way to load in only a specified list of SD tags? (I didn't find a "names" parameter for LoadSDF) * frame.

Re: [Rdkit-discuss] pandas / sd-tags

2013-07-01 Thread Nikolas Fechner
Hi Paul, I am not sure if it is easily doable to get the pandas read_table function to handle sd-files. However, there is some basic functionality for this already built-in in the PandasTools module. If you check the docktest header there is a small example. Basically, frame = PandasTools.Loa