RFD: concatening textconv filters

2013-02-21 Thread Michael J Gruber
During my day-to-day UGFWIINIT I noticed that we don't do textconv
iteratively. E.g.: I have a file

SuperSecretButDumbFormat.pdf.gpg

and textconv filters with attributes set for *.gpg and *.pdf (using
gpg resp. pdftotext). For Git, the file has only the gpg
attribute, of course. In this case, I would have wanted to pass the gpg
output through pdftotext.

Now, I can set up an extra filter gpgtopdftotext for *.pdf.gpg (hoping
I get the ordering in .gitattributes right), of course, but wondering
whether we could and should support concatenating filters by either

- making it easy to request it (say by setting
filter.gpgtopdftotext.textconvpipe to a list of textconv filter names
which are to be applied in sequence)

or

- doing it automatically (remove the pattern which triggered the filter,
and apply attributes again to the resulting pathspec)

Maybe it's just not worth the effort. Or a nice GSoC project ;)

Michael
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFD: concatening textconv filters

2013-02-21 Thread Junio C Hamano
Michael J Gruber g...@drmicha.warpmail.net writes:

 ... but wondering
 whether we could and should support concatenating filters by either

 - making it easy to request it (say by setting
 filter.gpgtopdftotext.textconvpipe to a list of textconv filter names
 which are to be applied in sequence)

 or

 - doing it automatically (remove the pattern which triggered the filter,
 and apply attributes again to the resulting pathspec)

I think what you are getting at is to start from something like this:

= .gitattributes =
*.gpg   diff=gpg
*.pdf   diff=pdf

and have git cat-file --textconv frotz.pdf.gpg (and other textconv
users) notice that:

 (1) The path matches *.gpg pattern, and calls for
 diff.gpg.textconv conversion.  This already happens in the
 current system.

 (2) After stripping the *.gpg pattern (i.e. look at the part of
 the path that matched the wildcard part * in the attribute
 selector), notice that the remainder, frotz.pdf, could match
 the *.pdf pattern.  The output from the previous filter could
 be treated as if it were a blob that is stored in that path.

A few issues that need to be addressed while designing this feature
that come to my mind at random are:

 * This seems to call for a new concept, but what exactly is that
   concept?  Your RFD sounds as if you desire a cascadable
   textconv, but it may be of a somewhat larger scope, virtual
   blob at a virtual path, which the last sentence in (2) above
   seems to suggest.

 * What is this new concept an attribute to?  If we express this as
   the textconv conversion result of any path with attribute
   diff=gpg can be treated as the contents of a virtual blob, then
   we are making it an attribute of the gpg type, i.e.

= .git/config =
[diff gpg]
textconv = gpg -v
textconvProducesVirtualBlob = yes

   To me, that seems sufficient for this particular application at
   the first glance, but are there other attributes that may want to
   produce such virtual blob for further processing?  Is limiting
   this to textconv too restrictive?  I do not know.

 * What is the rule to come up with the virtual path to base the
   attribute look-up on for the virtual blob contents?  In the
   above example, the pattern was a simple *.gpg, and we used a
   naïve what did the asterisk match?, but imagine a case where
   you have some documents that you want to do gpg -v and some you
   don't.  You express this by having the former class of files
   named with conv- prefix, or some convention that is convenient
   for you.

   Your .gitattributes may say something like:

= .gitattributes =
conv-*.gpg  diff=gpg

   When deciding what attributes to use to further process the
   result of conversion (i.e. virtual blob contents) for
   conv-frotz.pdf.gpg, what virtual path should we use?  Should we
   use conv-frotz.pdf, or just frotz.pdf?

   The difference does not matter--either would work is not a
   satisfactory answer, once you consider that you may want to have
   two or more classes of pdf files that you may want to treat
   differently, just like you did for gpg encrypted files in this
   example setting.  It seems to suggest that we want to use
   conv-frotz.pdf as the virtual path, but how would we derive that
   from the pattern conv-*.gpg and path conv-frotz.pdf.gpg?  It
   appears to me that you would need a way to say between the two
   literal parts in the pattern, conv- part needs to be kept but
   .gpg part needs to be stripped when forming the result.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html