This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Add reshape() for multi-dimensional array reshaping =head1 VERSION Maintainer: Nathan Wiger <[EMAIL PROTECTED]> Date: 24 Aug 2000 Version: 1 Mailing List: [EMAIL PROTECTED] Number: 148 Status: Developing =head1 ABSTRACT Currently, there is no easy way to reshape existing arrays into multiple arrays or matrices. This makes nifty array manipulation and complex math hard. 2 RFC's, 90 and 91, describe highly-specific solutions. However, these are non-extensible. A more general-purpose tool that can do arbitrary multi-dimensional array reshaping is a better choice for core. Other functions can then simply be specialized forms of this builtin. =head1 DESCRIPTION Let's jump in. This RFC proposes a C<reshape> builtin with the following syntax: @reshaped = reshape $x, $y, $i, @array [, @array ...] Where C<$x> and C<$y> are the length and number of the arrays produced, respectively (they can also be thought of as the x and y of a matrix, hence the notation). The C<$i> specifies the interleave of the elements. 0 specifies no interleave, whereas 1 specifies to interleave across lists. There is currently no meaning to $<i > 1>, but this may be added later. If called in the one-list form, then that list is split into multiple other lists. If called with more than one list, then those lists are joined back together into a single list. In both cases, C<$x>, C<$y>, and C<$i> are used in the same way to determine how the lists are reshaped. The dimensions are subject to the following properties: 1. Less data than specified causes C<reshape> to return undef 2. More data than specified is silently discarded If either the C<$x> or C<$y> dimension is undef or 0, then it is assumed to be a wildcard. See below. =head2 Single Array Form - SPLIT When one array is passed in, it is split up. Here, the C<$x> and C<$y> determine the dimensions of the resulting lists. The C<$i> determines the interleave. For example, assume reshape is called with the list (1..23) in the following forms: $x,$y,$i Results -------- ------------------------------------------ 3, 2, 0 ( [1,2,3], [4,5,6] ) 2, 4, 0 ( [1,2],[3,4],[5,6],[7,8] ) 3, 2, 1 ( [1,3,5], [2,4,6] ) 3, 3, 1 ( [1,4,7], [2,5,8], [3,6,9] ) 14,20,1 undef - not enough data to fill Notice how each dimension works together to C<reshape> the arrays. As such, the combination of the arguments is more significant than the individual arguments themselves. Also, note that any excess data left over after the dimensions have been fulfilled is discarded. In the final example, undef is returned, allowing you to easily check if you have enough data: @matrix = reshape 14, 20, 1, @input or die "Not enough data!"; In addition, wildcards can be used. With a fixed C<$y>, only that many lists are returned. However, with a wildcard C<$y>, any number of C<$x>-long lists are returned: $x,$y,$i Results -------- ------------------------------------------ 4, 0, 1 ( [1,6,11,16], [2,7,12,17], [3,8,13,18], [4,9,14,19], [5,10,15,20] ) # lose 21, 22 Note that we lose data here because we can't get an exact number of lists length C<$x>. With a fixed C<$x>, lists I<must> be returned that fixed length. However, with a wildcard C<$x>, lists will be expanded to fill the number specified by C<$y>, even in mismatched sizes: $x,$y,$i Results -------- ------------------------------------------ 0, 2, 1 ( [1,3,5,7,9,11,13,15,17,19,21,23], [2,4,6,8,10,12,14,16,18,20,22]) # unzip 0, 7, 0 ( [1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16], [17,18,19,20], [21,22,23] ) # partition Here, all the data is guaranteed to be preserved. It is simply split into exactly the number of parts specified by C<$y>, even if that results in some lists being smaller. =head1 Multiple Array Form - JOIN In this form, multiple arrays are joined back together. Here, C<$x>, C<$y>, and C<$i> specify the dimensions to use to rejoin the lists, not to split them up. The dimensions simply work in reverse: Rather than specifying how many lists to create, they specify which elements of the input lists are joined back together. So, we'll assume an input array of the form: ( [1,4,7,10], [2,5,8], [3,6,9] ) Which is called by C<reshape> with the following dimensions: $x,$y,$i Results -------- ------------------------------------------ 0, 0, 1 ( 1,2,3,4,5,6,7,8,9,10 ) # zip 0, 0, 0 ( 1,4,7,10,2,5,8,3,6,9 ) # simple concat 3, 0, 1 ( 1,2,3,4,5,6,7,8,9 ) # 3 vals from all lists 0, 2, 1 ( 1,2,4,5,7,8,10 ) # all vals from 2 lists 3, 2, 1 ( 1,2,4,5,7,8 ) # 3 vals x 2 lists Hopefully this is easy to understand. C<$x> controls how many elements of each list are used, and C<$y> controls how many lists are used. This is just like the splitting operation, but in reverse. C<$i> simply controls whether or not they're interleaved or just concatenated (same as @a = @b, @c). Again, wildcards (0) can be used here as well. =head2 zip, unzip, and partition RFC's 90 and 91 can now be entirely explained and written as specialized forms of C<reshape>: Function C<reshape> Equivalent ---------------- ---------------------------------- zip @a, @b reshape 0, 0, 1, @a, @b unzip $y, @c reshape 0, $y, 1, @c part $y, @c reshape 0, $y, 0, @c This makes understanding what the operations are doing and how they differ much easier. It also means that the functions can be extremely compact, since they need only pass C<reshape> the appropriate arguments. This also makes it obvious that the existing functions are only working on the y axis of our imaginary matrix model. This means additional functions that work just on the x axis may be useful as well. =head2 Matrix Calculations and Extensions It is the opinion of the author that extensive matrix calculations and manipulations be left to external modules. A function such as this should be able to take care of most of the basic funcionality needed to create N-dimensional matrices. However, true matrix functions should be put in modules. =head1 IMPLEMENTATION We'll get to this in v10. :-) =head1 MIGRATION None. This introduces new functionality. =head1 REFERENCES RFC 81: Lazily evaluated list generation functions http://www.mail-archive.com/perl6-language%40perl.org/msg01910.html Thanks to Uri Guttman for suggesting the APL "reshape" name
