I still have a philosophical objection to the idea that we are going to
standardize some kind of tabular format for utilities to "dump" their
data for further massaging ("parsing") by shell scripts.
I'm not opposed to the idea that shell scripts need to access the data
that is in these "databases", I just don't think that a general opinion
providing a way to dump the "whole" database for subsystem X (whatever
the subsystem is) is really the best approach, and I'm fairly confident
that whatever we settle upon, native parsing will become difficult for
at least some dialect. (E.g. dealing with escaped characters may be
easy for a particular version of sh, but what about for awk or perl or
for that matter Java? There already seems to be anecdotal evidence that
even sh versus ksh93 have some annoying differences in their handling of
read.)
If we need programmatic access to this data from shell scripts and such,
then lets quit trying to solve the problem by dumping the entire state
at once to the shell, and offer utilities to extract the state and
present it in a format so that shell scripts *don't* have to "parse" it.
My favored option is still the -o type of solution, with some other
option indicating a look up key (assuming that is pertinent.) I don't
think the ability to choose different delimiters is really that
important here, nor, IMO, is the ability to dump more than a single
field in an invocation. Both of those wind up raising the whole
"parsing" question because you have to find a neutral delimiter, and
thus require token parsing of some sort. (Hmm... that does still leave
the issue of listing all the records ala zfs list, open, but *probably*
its safe to assume that we can separate records by newlines.)
That said, if, as a one-off solution, there is a desire to dump more
information at once, I don't see a problem with inventing a special
format for it. I just don't think we're likely to standardize on one
that works everywhere. Instead, we should, IMO, discourage the creation
of solutions which require token separation to be performed by shell
scripts.
Alternatively, we can provide tools which perform general format parsing
on behalf of the shells and have the parseable format come in such a
format. (The tools I'm talking about would perform lookup and field
extraction on behalf of the calling script.) I'd advise in such a case
against inventing yet another new file format though. (I think I
already mentioned XML. Likely something simpler, such as CSV or
tab-delimited fields, would be more palatable. It would certainly make
processing easier for languages that don't already have XML support.)
-- Garrett
John Plocher wrote:
> Darren Reed wrote:
>> To bring this back to where it started, the issues are (for PSARC):
>> - given that there will be future work that wants to generate
>> parsable output, do we need an opinion written up (for this case)
>> to serve as the notice of our decision about it or is it sufficient
>> to just cite this case?
>
> No opinion should be needed - though a best practice (written by Joe
> or Garrett or Nico or you or...) that summarizes this into something
> reusable would be good.
>
> Unlike Joe, I do not believe this is a one-off - we need structure
> and consistency in this area, and this case (like zoneadm) presents
> a reasonable way to provide it *if*, in fact, the project team can
> solve the escape sequence parsing problem).
>
> To me, that structure is:
>
> We (the ARC, Sun,...) do not want every utility to do
> one-off parsable output formats if we can help it - or
> to use different CLI utterances to obtain it. We want
> the output to be easily usable in the places where we
> expect it to be commonly used - shells, scripting languages,
> etc. And we don't need to handle every conceivable future
> possibility as part of this case.
>
> A spec that would work for me would say simply
> use
> command -t ':' -p -o xx,yy,zz
> to get tabular, ':' delimited and properly escaped output
>
> Here are examples of how to use this output:
>
> ksh93: ...
> perl: ....
> fortran: ... :-) ...
>
>
>
>
>> - if we're going to use this case as the foundation for all future
>> cases that are presenting output from commands, such as these,
>> that is meant to be parsable, do we:
>> 1) decide that we insist that commands use -o/-p unless history
>> prevents it? (i.e. new commands *MUST* use this combination)
>
> If new commands choose to provide parsable output, the CLIP
> guidelines strongly suggest use of a common CLI term. "-p" seems
> to be the one we have defacto standardized upon.
>
> Same for "-o aaa,bbb,ccc". And ":" as a separator.
>
> Wishing that we didn't have to parse command output so we wouldn't
> have to address this issue is IMO naive. The fact remains that
> it is common, useful and expedient to provide this type of data
> in tabular multiline form. If it turns out that it isn't easily
> parsable in shell, then we'll all just use perl or whatever - and
> not lose any sleep over it. Getting access to the data is the key
> enabler here - its exact format is secondary - if I can't get the
> data in the first place, it doesn't matter what format it isn't in.
>
> A revised spec would be good.
>
> -John
>