In case it wasn't clear, the whole message below was in reference to 
some proposed new CLIP-like standard for programs to follow.  I have 
little objection with a case-by-case addition of parsed output (such as 
here), although I'd encourage project teams to consider whether a full 
state dump is truly the best way to provide data to script authors.  (It 
may or may not be.)

    -- Garrett

Garrett D'Amore wrote:
> I still have a philosophical objection to the idea that we are going 
> to standardize some kind of tabular format for utilities to "dump" 
> their data for further massaging ("parsing") by shell scripts.
>
> I'm not opposed to the idea that shell scripts need to access the data 
> that is in these "databases", I just don't think that a general 
> opinion providing a way to dump the "whole" database for subsystem X 
> (whatever the subsystem is) is really the best approach, and I'm 
> fairly confident that whatever we settle upon, native parsing will 
> become difficult for at least some dialect.  (E.g. dealing with 
> escaped characters may be easy for a particular version of sh, but 
> what about for awk or perl or for that matter Java?  There already 
> seems to be anecdotal evidence that even sh versus ksh93 have some 
> annoying differences in their handling of read.)
>
> If we need programmatic access to this data from shell scripts and 
> such, then lets quit trying to solve the problem by dumping the entire 
> state at once to the shell, and offer utilities to extract the state 
> and present it in a format so that shell scripts *don't* have to 
> "parse" it.
>
> My favored option is still the -o type of solution, with some other 
> option indicating a look up key (assuming that is pertinent.)  I don't 
> think the ability to choose different delimiters is really that 
> important here, nor, IMO, is the ability to dump more than a single 
> field in an invocation.  Both of those wind up raising the whole 
> "parsing" question because you have to find a neutral delimiter, and 
> thus require token parsing of some sort.   (Hmm... that does still 
> leave the issue of listing all the records ala zfs list, open, but 
> *probably* its safe to assume that we can separate records by newlines.)
>
> That said, if, as a one-off solution, there is a desire to dump more 
> information at once, I don't see a problem with inventing a special 
> format for it.   I just don't think we're likely to standardize on one 
> that works everywhere.  Instead, we should, IMO, discourage the 
> creation of solutions which require token separation to be performed 
> by shell scripts.
>
> Alternatively, we can provide tools which perform general format 
> parsing on behalf of the shells and have the parseable format come in 
> such a format.  (The tools I'm talking about would perform lookup and 
> field extraction on behalf of the calling script.) I'd advise in such 
> a case against inventing yet another new file format though.   (I 
> think I already mentioned XML.  Likely something simpler, such as CSV 
> or tab-delimited fields, would be more palatable.  It would certainly 
> make processing easier for languages that don't already have XML 
> support.)
>
>    -- Garrett
>
> John Plocher wrote:
>> Darren Reed wrote:
>>> To bring this back to where it started, the issues are (for PSARC):
>>> - given that there will be future work that wants to generate
>>>  parsable output, do we need an opinion written up (for this case)
>>>  to serve as the notice of our decision about it or is it sufficient
>>>  to just cite this case?
>>
>> No opinion should be needed - though a best practice (written by Joe
>> or Garrett or Nico or you or...) that summarizes this into something
>> reusable would be good.
>>
>> Unlike Joe, I do not believe this is a one-off - we need structure
>> and consistency in this area, and this case (like zoneadm) presents
>> a reasonable way to provide it  *if*, in fact, the project team can
>> solve the escape sequence parsing problem).
>>
>> To me, that structure is:
>>
>>     We (the ARC, Sun,...) do not want every utility to do
>>     one-off parsable output formats if we can help it - or
>>     to use different CLI utterances to obtain it.  We want
>>     the output to be easily usable in the places where we
>>     expect it to be commonly used - shells, scripting languages,
>>     etc.  And we don't need to handle every conceivable future
>>     possibility as part of this case.
>>
>>    A spec that would work for me would say simply
>>      use
>>     command -t ':' -p -o xx,yy,zz
>>      to get tabular, ':' delimited and properly escaped output
>>
>>      Here are examples of how to use this output:
>>
>>      ksh93: ...
>>      perl: ....
>>      fortran: ... :-) ...
>>
>>
>>
>>
>>> - if we're going to use this case as the foundation for all future
>>>  cases that are presenting output from commands, such as these,
>>>  that is meant to be parsable, do we:
>>>  1) decide that we insist that commands use -o/-p unless history
>>>     prevents it? (i.e. new commands *MUST* use this combination)
>>
>> If new commands choose to provide parsable output, the CLIP
>> guidelines strongly suggest use of a common CLI term.  "-p" seems
>> to be the one we have defacto standardized upon.
>>
>> Same for "-o aaa,bbb,ccc".  And ":" as a separator.
>>
>> Wishing that we didn't have to parse command output so we wouldn't
>> have to address this issue is IMO naive.  The fact remains that
>> it is common, useful and expedient to provide this type of data
>> in tabular multiline form.  If it turns out that it isn't easily
>> parsable in shell, then we'll all just use perl or whatever - and
>> not lose any sleep over it.  Getting access to the data is the key
>> enabler here - its exact format is secondary - if I can't get the
>> data in the first place, it doesn't matter what format it isn't in.
>>
>> A revised spec would be good.
>>
>>   -John
>>
>


Reply via email to