Hi Paul,
I would suggest:
- assigning dtype of dataframe/column to str/np.object
- cleaning up the IC50s
- casting to float/int as dataframe.astype()
Or alternatively you could use "converters" argument:
pd.read_csv('filename.csv', converters={'ic50_colname': lambda x:
x.replace('>', '')})
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
----
Pozdrawiam, | Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl
2016-03-11 11:12 GMT+01:00 Paul Czodrowski <paul.czodrow...@merckgroup.com>:
> Dear RDKitter & Pandas-Dataframes heavy users,
>
>
>
> please find below a question concerning the conversion of pandas
> dataframes:
>
> df = pd.DataFrame({"item": ["a", "b", "c", "d", "e"], "row1":
> [1,2,3,">2",5],
> "row2":[0.1,0.2,0.3,0.4,0.5],"row3":["ab","cd","ed","gh","ij"]})
>
> df_new = df[df[["row1"]].applymap(np.isreal).all(1)]
>
>
>
> I would like to get rid of this nasty ">2" entry in "row1" => This works
> perfect given the snippet above.
>
>
>
> However, when I read in a CSV file containing similar data (see the
> attached CSV) => The conversion does not work: all columns in the IC50
> value are discarded and end up in yielding "NaN".
>
>
>
> What is going wrong?
>
>
>
>
>
> Thanks & Cheers,
>
> Paul
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
>
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss