[issue43625] CSV has_headers heuristic could be improved

2021-07-30 Thread Łukasz Langa
Change by Łukasz Langa : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed type: -> behavior ___ Python tracker ___

[issue43625] CSV has_headers heuristic could be improved

2021-07-30 Thread Łukasz Langa
Łukasz Langa added the comment: New changeset 440c9f772a9b66c1ea387c1c3efc9ff438880acf by Miss Islington (bot) in branch '3.10': bpo-43625: Enhance csv sniffer has_headers() to be more accurate (GH-26939) (GH-27494)

[issue43625] CSV has_headers heuristic could be improved

2021-07-30 Thread Andrei Kulakov
Andrei Kulakov added the comment: Łukasz: I agree. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43625] CSV has_headers heuristic could be improved

2021-07-30 Thread Łukasz Langa
Łukasz Langa added the comment: Since this is a heuristic change, I'm thinking we shouldn't backport this to 3.9. This way it will be easier for application authors to notice the change and control it within their apps. -- versions: +Python 3.10

[issue43625] CSV has_headers heuristic could be improved

2021-07-30 Thread miss-islington
Change by miss-islington : -- nosy: +miss-islington nosy_count: 5.0 -> 6.0 pull_requests: +26010 pull_request: https://github.com/python/cpython/pull/27494 ___ Python tracker

[issue43625] CSV has_headers heuristic could be improved

2021-07-30 Thread Łukasz Langa
Łukasz Langa added the comment: New changeset ceea579ccc51791f3e115155d6f27905bc7544a9 by andrei kulakov in branch 'main': bpo-43625: Enhance csv sniffer has_headers() to be more accurate (GH-26939) https://github.com/python/cpython/commit/ceea579ccc51791f3e115155d6f27905bc7544a9 --

[issue43625] CSV has_headers heuristic could be improved

2021-06-29 Thread Andrei Kulakov
Andrei Kulakov added the comment: Skip: I've updated the PR Expanded the docs a bit, I think they give a good rough idea of the logic but I feel like they can probably be improved more, let me know what you think! I spent some time thinking about it but found it tricky! :) --

[issue43625] CSV has_headers heuristic could be improved

2021-06-29 Thread Skip Montanaro
Skip Montanaro added the comment: Here's a NEWS entry. -- Added file: https://bugs.python.org/file50132/2021-06-29-07-27-08.bpo-43625.ZlAxhp.rst ___ Python tracker ___

[issue43625] CSV has_headers heuristic could be improved

2021-06-29 Thread Skip Montanaro
Skip Montanaro added the comment: Here is a change to the has_header documentation and an extra test case documenting the behavior when the sample contains strings. I'm not sure about the wording of the doc change, perhaps you can tweak it? Seems kind of clumsy to me. If it seems okay to

[issue43625] CSV has_headers heuristic could be improved

2021-06-29 Thread Skip Montanaro
Skip Montanaro added the comment: I retract my comment about fixed length strings in the non-numeric case. There are clearly test cases (which I probably wrote, considering the values) where the sample as a header but the values are of varying length. Misread of the code on my part. I have

[issue43625] CSV has_headers heuristic could be improved

2021-06-29 Thread Skip Montanaro
Skip Montanaro added the comment: Thanks @andrei.avk. You are right, only the complex test is required. I suppose it's okay to commit this, but reviewing the full code of the has_header method leaves me thinking this is just putting lipstick on a pig. If I read the code correctly, there are

[issue43625] CSV has_headers heuristic could be improved

2021-06-28 Thread Andrei Kulakov
Andrei Kulakov added the comment: Skip: If I understand right, in the patch the last two types -- float and int, will never have an effect because if float(x) and int(x) succeed, so will complex(x), and conversely, if complex(x) fails, float and int will also fail. So the effect of the

[issue43625] CSV has_headers heuristic could be improved

2021-06-28 Thread Andrei Kulakov
Andrei Kulakov added the comment: I've added the PR here, based on Skip's patch: https://github.com/python/cpython/pull/26939 -- ___ Python tracker ___

[issue43625] CSV has_headers heuristic could be improved

2021-06-28 Thread Andrei Kulakov
Change by Andrei Kulakov : -- nosy: +andrei.avk nosy_count: 3.0 -> 4.0 pull_requests: +25507 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26939 ___ Python tracker

[issue43625] CSV has_headers heuristic could be improved

2021-03-25 Thread Skip Montanaro
Skip Montanaro added the comment: I assume the OP is referring to this sort of usage: >>> sniffer = csv.Sniffer() >>> raw = open("mixed.csv").read() >>> sniffer.has_header(raw) False *sigh* I really wish the Sniffer class had never been added to the CSV module. I can't recall who wrote it

[issue43625] CSV has_headers heuristic could be improved

2021-03-25 Thread Raymond Hettinger
Change by Raymond Hettinger : -- nosy: +rhettinger ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43625] CSV has_headers heuristic could be improved

2021-03-25 Thread Raymond Hettinger
Change by Raymond Hettinger : -- nosy: +skip.montanaro ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43625] CSV has_headers heuristic could be improved

2021-03-25 Thread ejacq
New submission from ejacq <0pyth...@jesuislibre.net>: Here is an sample of CSV input: "time","forces" 0,0 0.5,0.9 when calling has_header() from csv.py on this sample, it returns false. Why? because 0 and 0.5 don't belong to the same type and thus the column is discarded by the heuristic. I