This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

hooks in pack/unpack

=head1 VERSION

  Maintainer: Ilya Zakharevich <[EMAIL PROTECTED]>
  Date: 16 September 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 250
  Version: 1
  Status: Developing

=head1 ABSTRACT

How to specify pack()/unpack() user-recipes.

=head1 DESCRIPTION

The following enhancement covers almost all the of the remaining ways
to store binary data, but it is substantially higher on the "bizzareness"
scale:

C<'R[ID,TYPE]'> in a TEMPLATE: during unpack() extracts a value as TYPE
and uses it to choose between several choices of templates. Behaves as
if C<R[ID,TYPE]> is replaced by the chosen template.  The templates to
chose from are looked in/via the array/hash/subroutine referenced by
C<$unpack::recipes[ID]>.

Similarly, C<'R{ID,TYPE}'> uses C<$unpack::recipes{ID}> instead.

The returned template may be replaced by a reference to array of the form

  [TEMPLATE, \&POSTPROCESS]

In such a case a value is extracted with TEMPLATE, then is postprocessed
by calling C<POSTPROCESS($extracted)>, the return value replaces the
extracted value.

Optional: one should be able to specify that some bits of the last
extracted value which are ignored: C<'R[ID,FROM..TO,TYPE]'> uses bits
from FROM to TO (shifted right by FROM) as the index.  C<'R[ID,mod,TYPE]'>
uses the last extracted value modulo the length of the array referenced by
$unpack::recipes[ID].

This extracts UTF8 chars of up to 2-byte encoded length:

  sub utf8_2byte_postprocess { (($_[0] & 0x1F00) >> 2) | ($_[0] & 0x3F) }

  local $unpack::recipes{UTF8} = [ 'C', [ 'n', \&utf8_2byte_postprocess ] ];
  $n = unpack 'R{UTF8,7..7,C}', $str;

Symmetrically, during pack() C<'R[ID]'> etc. make ID lookup in %pack::recipes
or @pack::recipes.  The resulting array/hash/subroutine reference is
indexed-by/called-with the next argument to pack.  The result is appended
to the target string.

The symmetric example to the UTF8 example above:

  sub utf8_2byte_save {
    return pack "C", $_[0] if $_[0] <= 127; 
    pack 'n',  0x80C0 | (($_[0] & 0x7C0)<<2) | ($_[0] & 0x3F);
  };

  local $pack::recipes{UTF8} = \&utf8_2byte_save;
  $str = pack 'R{UTF8}', $n;

Optionally, to allow a usage of the same TEMPLATE during pack() and during
unpack(), anything after the first comma in the argument to C<'R'> is ignored:

  $str = pack 'R{UTF8}',       $n;
  $str = pack 'R{UTF8,7:7,C}', $n;

are equivalent.

The usage of pack() with I<only one> C<'R'> type inside is obviously an
overkill, but it comes very handy if C<'R'> is a part of a more complicated
construct, as in

  $str = pack 'N/R{UTF8,7:7,C}', @array;

or

  $str = pack 'N/( \g[ N/R{UTF8,7:7,C} ] )', @array_of_arrays;

In addition to "funny ways to encode simple data", this same proposal allows
handling of streams which consist of repeated blocks of the form

  int type; union { struct type_0 t0; struct type_1 t1; ... }

as well as many other similar problems.

=head1 MIGRATION ISSUES

None.

=head1 IMPLEMENTATION

Straightforward.

=head1 REFERENCES

RFC 142: Enhanced Pack/Unpack

RFC 246: pack/unpack uncontrovercial enhancements

RFC 248: enhanced groups in pack/unpack

Reply via email to