Skip Montanaro <[email protected]> added the comment:
I assume the OP is referring to this sort of usage:
>>> sniffer = csv.Sniffer()
>>> raw = open("mixed.csv").read()
>>> sniffer.has_header(raw)
False
*sigh*
I really wish the Sniffer class had never been added to the CSV module. I can't
recall who wrote it (the author is long gone). Though I am responsible for the
initial commits, it wasn't me or the main authors of csvmodule.c. As far as I
know, it never really worked well. I can't recall ever using it.
A simpler heuristic would be if the first row contains a bunch of strings and
the second row contains a bunch of numbers, then the file has a header. That
assumes that CSV files consist mostly of numeric data.
Looking at has_header, I see this:
for thisType in [int, float, complex]:
I think this particular problem would be solved if the order of those types
were reversed. The attached diff suggests that as well. Note that the Sniffer
class currently contains no test cases, so that the test I added failed before
the change and passes after doesn't mean it doesn't break someone's mission
critical Sniffer usage.
(Sorry, Raymond. My Github-foo is insufficient to allow me to fork, apply the
diff and create a PR.)
----------
keywords: +patch
Added file: https://bugs.python.org/file49915/csv.diff
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43625>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com