As the subject suggests, I'm trying to find a universal solution for reading CSVs via the SmilesMolSupplier (as the input setup could be single column or multiple column, using the pandas tools for interconversion is overkill)
The general structure I use for analysing the CSV is: with open(chem_file_name, "r") as csv_upload_file: first_line = csv_upload_file.readline() dialect = sniffer.sniff(first_line) has_header = sniffer.has_header(first_line) csv_upload_file.close() supplier = Chem.SmilesMolSupplier(chem_file_name, delimiter=str(dialect.delimiter), smilesColumn=smi_col_header, nameColumn=-1, titleLine=has_header) If I use a CSV without quoted data,, this is fine, I can autodetect the delimiter, the column header is loaded in by the rest of my workflow, everything else is worked out through the CSV sniffer. However, where it is quoted data, the actual parsing will fail because of the quotemarks. [10:09:56] SMILES Parse Error: syntax error for input: '"C1=CC=CC=C1"' [10:09:56] ERROR: Smiles parse error on line 1 Is there some easy way of handling this, or do I have to mandate not using quoting of data in the CSV generation?
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss