As the subject suggests, I'm trying to find a universal solution for
reading CSVs via the SmilesMolSupplier (as the input setup could be single
column or multiple column, using the pandas tools for interconversion is
overkill)

The general structure I use for analysing the CSV is:


with open(chem_file_name, "r") as csv_upload_file:
            first_line = csv_upload_file.readline()
            dialect = sniffer.sniff(first_line)
            has_header = sniffer.has_header(first_line)
            csv_upload_file.close()

supplier = Chem.SmilesMolSupplier(chem_file_name,
delimiter=str(dialect.delimiter), smilesColumn=smi_col_header,
nameColumn=-1, titleLine=has_header)

If I use a CSV without quoted data,, this is fine, I can autodetect the
delimiter, the column header is loaded in by the rest of my workflow,
everything else is worked out through the CSV sniffer. However, where it is
quoted data, the actual parsing will fail because of the quotemarks.

[10:09:56] SMILES Parse Error: syntax error for input: '"C1=CC=CC=C1"'
[10:09:56] ERROR: Smiles parse error on line 1

Is there some easy way of handling this, or do I have to mandate not using
quoting of data in the CSV generation?
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to