[Rdkit-devel] pandastool molecules -> excel

Rex Dwyer via Rdkit-devel Wed, 24 Mar 2021 14:48:31 -0700

Hi,
I tried to use the rdkit.Chem.PandasTools.SaveXlsxFromFrame function, and found 
that it didn't work for me:


  *   It was only possible to have one column with molecules.
  *   It moved the molecule column to the left side of the spreadsheet.
  *   It did not preserve hyperlinks.
I'm sending along this alternative.  If someone wants to take it and run with 
it, that's great, and no need to give me any credit.
If not, I'm sorry, but my workload doesn't permit me to spend time to go 
through all the usual procedures for contributing.
The technical insight here is to rely on DataFrame.to_excel to fill the 
spreadsheet.  It preserves all the data (such as links) as stored in the 
dataframe.  Then the columns that need graphics can be modified after that.
I hope this is useful.
Cheers,
Rex


def to_excel_with_molecules(df, xlsx_name, mol_cols, sheet_name="Sheet1", 
size=(300, 300)):
    """
    Writes a dataframe into an excel spreadsheet, adding graphic of molecule to 
specified columns.
    @param df: dataframe to write
    @type df: dataframe
    @param xlsx_name: name of output excel file.  Must end ".xlsx"
    @type xlsx_name: str
    @param mol_cols: a column or list of columns with smiles strings to be 
expanded.
    If any element of the columns can't be parsed as as SMILES string, it will 
be left alone.
    @type mol_cols: str or list of str (actually, any iterable of str)
    @param sheet_name: The name of the sheet in the workbook.
    @type sheet_name: str
    @param size: (width,height) of images
    @type size: tuple (could be list)
    @return: None
    @rtype: NoneType
    """
    df = df.copy()

    if type(mol_cols) is str:
        mol_cols = [mol_cols]
    mol_cols = list(mol_cols)

    writer = pd.ExcelWriter(xlsx_name, engine="xlsxwriter")
    df.to_excel(writer, sheet_name=sheet_name, index=False)
    workbook = writer.book
    worksheet = writer.sheets[sheet_name]

    mol_cols = [i for i, col in enumerate(df.columns) if col in mol_cols]
    df.columns = list(range(len(df.columns)))

    for c in mol_cols:
        worksheet.set_column(c, c, width=size[0] / 6)
        for rr, smiles in enumerate(df[c]):
            try:
                r = rr + 1  # row 0 of df will be in row 1 of spreadsheet.
                image_data = BytesIO()
                img = MolToImage(MolFromSmiles(smiles), size=size)
                img.save(image_data, format="PNG")
                worksheet.set_row(r, height=size[1])
                worksheet.insert_image(r, c, "f", {"image_data": image_data})
            except:  # Some are expected to fail, e.g., None when there is no 
matching molecule.
                pass

    workbook.close()
    return

DISCLAIMER: This email and any file transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you are not the named addressee, you should not disseminate, 
distribute or copy this e-mail. Please notify the sender or the system manager 
and delete or destroy this email and any attachment immediately.

_______________________________________________
Rdkit-devel mailing list
Rdkit-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-devel

[Rdkit-devel] pandastool molecules -> excel

Reply via email to