In case it wasn't clear, the whole message below was in reference to
some proposed new CLIP-like standard for programs to follow. I have
little objection with a case-by-case addition of parsed output (such as
here), although I'd encourage project teams to consider whether a full
state dump is truly the best way to provide data to script authors. (It
may or may not be.)
-- Garrett
Garrett D'Amore wrote:
> I still have a philosophical objection to the idea that we are going
> to standardize some kind of tabular format for utilities to "dump"
> their data for further massaging ("parsing") by shell scripts.
>
> I'm not opposed to the idea that shell scripts need to access the data
> that is in these "databases", I just don't think that a general
> opinion providing a way to dump the "whole" database for subsystem X
> (whatever the subsystem is) is really the best approach, and I'm
> fairly confident that whatever we settle upon, native parsing will
> become difficult for at least some dialect. (E.g. dealing with
> escaped characters may be easy for a particular version of sh, but
> what about for awk or perl or for that matter Java? There already
> seems to be anecdotal evidence that even sh versus ksh93 have some
> annoying differences in their handling of read.)
>
> If we need programmatic access to this data from shell scripts and
> such, then lets quit trying to solve the problem by dumping the entire
> state at once to the shell, and offer utilities to extract the state
> and present it in a format so that shell scripts *don't* have to
> "parse" it.
>
> My favored option is still the -o type of solution, with some other
> option indicating a look up key (assuming that is pertinent.) I don't
> think the ability to choose different delimiters is really that
> important here, nor, IMO, is the ability to dump more than a single
> field in an invocation. Both of those wind up raising the whole
> "parsing" question because you have to find a neutral delimiter, and
> thus require token parsing of some sort. (Hmm... that does still
> leave the issue of listing all the records ala zfs list, open, but
> *probably* its safe to assume that we can separate records by newlines.)
>
> That said, if, as a one-off solution, there is a desire to dump more
> information at once, I don't see a problem with inventing a special
> format for it. I just don't think we're likely to standardize on one
> that works everywhere. Instead, we should, IMO, discourage the
> creation of solutions which require token separation to be performed
> by shell scripts.
>
> Alternatively, we can provide tools which perform general format
> parsing on behalf of the shells and have the parseable format come in
> such a format. (The tools I'm talking about would perform lookup and
> field extraction on behalf of the calling script.) I'd advise in such
> a case against inventing yet another new file format though. (I
> think I already mentioned XML. Likely something simpler, such as CSV
> or tab-delimited fields, would be more palatable. It would certainly
> make processing easier for languages that don't already have XML
> support.)
>
> -- Garrett
>
> John Plocher wrote:
>> Darren Reed wrote:
>>> To bring this back to where it started, the issues are (for PSARC):
>>> - given that there will be future work that wants to generate
>>> parsable output, do we need an opinion written up (for this case)
>>> to serve as the notice of our decision about it or is it sufficient
>>> to just cite this case?
>>
>> No opinion should be needed - though a best practice (written by Joe
>> or Garrett or Nico or you or...) that summarizes this into something
>> reusable would be good.
>>
>> Unlike Joe, I do not believe this is a one-off - we need structure
>> and consistency in this area, and this case (like zoneadm) presents
>> a reasonable way to provide it *if*, in fact, the project team can
>> solve the escape sequence parsing problem).
>>
>> To me, that structure is:
>>
>> We (the ARC, Sun,...) do not want every utility to do
>> one-off parsable output formats if we can help it - or
>> to use different CLI utterances to obtain it. We want
>> the output to be easily usable in the places where we
>> expect it to be commonly used - shells, scripting languages,
>> etc. And we don't need to handle every conceivable future
>> possibility as part of this case.
>>
>> A spec that would work for me would say simply
>> use
>> command -t ':' -p -o xx,yy,zz
>> to get tabular, ':' delimited and properly escaped output
>>
>> Here are examples of how to use this output:
>>
>> ksh93: ...
>> perl: ....
>> fortran: ... :-) ...
>>
>>
>>
>>
>>> - if we're going to use this case as the foundation for all future
>>> cases that are presenting output from commands, such as these,
>>> that is meant to be parsable, do we:
>>> 1) decide that we insist that commands use -o/-p unless history
>>> prevents it? (i.e. new commands *MUST* use this combination)
>>
>> If new commands choose to provide parsable output, the CLIP
>> guidelines strongly suggest use of a common CLI term. "-p" seems
>> to be the one we have defacto standardized upon.
>>
>> Same for "-o aaa,bbb,ccc". And ":" as a separator.
>>
>> Wishing that we didn't have to parse command output so we wouldn't
>> have to address this issue is IMO naive. The fact remains that
>> it is common, useful and expedient to provide this type of data
>> in tabular multiline form. If it turns out that it isn't easily
>> parsable in shell, then we'll all just use perl or whatever - and
>> not lose any sleep over it. Getting access to the data is the key
>> enabler here - its exact format is secondary - if I can't get the
>> data in the first place, it doesn't matter what format it isn't in.
>>
>> A revised spec would be good.
>>
>> -John
>>
>