Bug#824252: -t and -d don't work as expected

2017-01-07 Thread Craig Sanders
On Sat, Jan 07, 2017 at 10:28:14PM -0500, Sandro Tosi wrote:
> upstream at https://github.com/wireservice/csvkit/issues/666 closed
> the issue with
> 
> ```
> Not a bug. You must pipe the output of a command into csvformat to
> change the output format.
> ```
> 
> hence i'm closing this report as well. If you still think something
> needs fixing, i suggest to continue discussing that upstream on the
> mentioned github issue

The upstream response misses the point (and the problem) entirely.

Here's the real issue, quoted from my original bug report.

 > Using tr or similar to work around this is unreliable - there may be
 > actual commas in any of the tab-separated fields.

(i made a mistake there. i meant to say "any of the comma-separated
fields", not "tab-separated").

if there are commas inside the fields, in the actual data, it is
**impossible** for any other program that post-processes csvkit's
output to distinguish between a comma within a field from a comma that
separates a field.  That includes csvformat as well as tr, awk, sed, or
any other tool.

THAT is why being able to specify the output field separator is
essential - so you can choose a delimiter that ISN'T in the data.

if csvkit can't do that, then it's not fit for purpose.

This sort of data vs delimiter problem is very common. If you can choose
the delimiter, then it's a trivial problem to solve, easily scripted.
If you can't choose, then the **only** way to be certain that your
output isn't going to be garbled by an extra comma or tab is to visually
inspect it and edit it by hand - it is a task that requires human-level
intelligence and great patience and attention to detail (or at least AI
parsing of the data)



I'll leave it up to you to decide whether it's worth re-opening this
bugupstream seems either indifferent or hostile to fixing it.

I will open another bug report, though. Looking at the github repo, it
seems that documentation for csvkit does exist.  I thought there wasn't
any, but there is, in .rst format - it's just missing from the debian
packages. rst files can easily be converted to manpages, html, epub,
pdf, and other formats with, e.g., pandoc or python-docutils.

craig

--
craig sanders 



Bug#824252: -t and -d don't work as expected

2016-05-14 Thread Craig Sanders
Package: python3-csvkit
Version: 0.9.1-3

The -t (--tabs) and -d (--delimiter) options only change the
input delimiter, while the output delimiter is hard-coded to
be a comma.

There doesn't appear to be any way to change the output delimiter.

Using tr or similar to work around this is unreliable - there may be
actual commas in any of the tab-separated fields.  It also defeats the
purpose of using specialised tools like csvcut.

e.g. see example usage of csvcut and comments at 
http://unix.stackexchange.com/a/186354/7696


The output delimiter should either be the same as the input delimiter
or there should be extra options (perhaps -T and -D) to set it
(similar to the way awk's FS and OFS variables work)

Thanks,

craig

-- 
craig sanders