Re: [Rdkit-discuss] Using SmilesMolSuplier with CSV containing quotemarks

Tim Dudgeon Mon, 10 Jan 2022 06:12:23 -0800

I think you have to add a step that removes the quote marks if they are
present?
Tim


On Mon, Jan 10, 2022 at 10:15 AM James Wallace <jeawall...@gmail.com> wrote:

> As the subject suggests, I'm trying to find a universal solution for
> reading CSVs via the SmilesMolSupplier (as the input setup could be single
> column or multiple column, using the pandas tools for interconversion is
> overkill)
>
> The general structure I use for analysing the CSV is:
>
>
> with open(chem_file_name, "r") as csv_upload_file:
>             first_line = csv_upload_file.readline()
>             dialect = sniffer.sniff(first_line)
>             has_header = sniffer.has_header(first_line)
>             csv_upload_file.close()
>
> supplier = Chem.SmilesMolSupplier(chem_file_name,
> delimiter=str(dialect.delimiter), smilesColumn=smi_col_header,
> nameColumn=-1, titleLine=has_header)
>
> If I use a CSV without quoted data,, this is fine, I can autodetect the
> delimiter, the column header is loaded in by the rest of my workflow,
> everything else is worked out through the CSV sniffer. However, where it is
> quoted data, the actual parsing will fail because of the quotemarks.
>
> [10:09:56] SMILES Parse Error: syntax error for input: '"C1=CC=CC=C1"'
> [10:09:56] ERROR: Smiles parse error on line 1
>
> Is there some easy way of handling this, or do I have to mandate not using
> quoting of data in the CSV generation?
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Using SmilesMolSuplier with CSV containing quotemarks

Reply via email to