I think you have to add a step that removes the quote marks if they are present? Tim
On Mon, Jan 10, 2022 at 10:15 AM James Wallace <jeawall...@gmail.com> wrote: > As the subject suggests, I'm trying to find a universal solution for > reading CSVs via the SmilesMolSupplier (as the input setup could be single > column or multiple column, using the pandas tools for interconversion is > overkill) > > The general structure I use for analysing the CSV is: > > > with open(chem_file_name, "r") as csv_upload_file: > first_line = csv_upload_file.readline() > dialect = sniffer.sniff(first_line) > has_header = sniffer.has_header(first_line) > csv_upload_file.close() > > supplier = Chem.SmilesMolSupplier(chem_file_name, > delimiter=str(dialect.delimiter), smilesColumn=smi_col_header, > nameColumn=-1, titleLine=has_header) > > If I use a CSV without quoted data,, this is fine, I can autodetect the > delimiter, the column header is loaded in by the rest of my workflow, > everything else is worked out through the CSV sniffer. However, where it is > quoted data, the actual parsing will fail because of the quotemarks. > > [10:09:56] SMILES Parse Error: syntax error for input: '"C1=CC=CC=C1"' > [10:09:56] ERROR: Smiles parse error on line 1 > > Is there some easy way of handling this, or do I have to mandate not using > quoting of data in the CSV generation? > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss