[random thoughts I had one week ago at YAPC::Europe and am jotting down now;
was unsubscribed from the p6 lists during that time, awfully sorry if this
subject has already been beaten to death]
When designing the bytecode of Perl 6 special care and planning should
be directed towards data storage and representation. In more concrete
terms, we should have bytecodes for example for
newpvn
newiv
newuv
newnv
newav
newhv
newrv
(using perl5ish terms like pv here just for the sake of clarity;
whether those terms carry over to perl6 remains to be seen). The
above list is not complete and it is there just for the sake of
discussion.
Having such bytecodes available is essential for the proposed
functionality of dumping the state of a live and running Perl virtual
machine and breathing life into that later.
Some more detailed examples;
newpvn flags length data
newiv flags iv
newav flags length (length byteop clusters creating the scalars)
newav flags2 length (length offsets to data) length2 data
newav flags2 length (length nvs)
newav flags3 length (length index byteop cluster pairs creating the scalars)
The above bytecodes will create anonymous blobs of data. Where, in
the PVM, stack, or in a PVM register, is left open in this note. How
the blobs are bound to scalar variables using what scope and stored
into aggregate variables (arrays, hashes) is a subject for another
discussion. The newav flags2 examples are quick ideas into creating
strongly typed arrays, first of pvs, and then of nvs. The flags3 is
a quick idea for supporting sparse array creation.
The knowledge gathered from writing the modules Storable,
Freeze::Thaw, and Data::Dumper, should be studied very carefully.
Of course also the other bytecodes like JVM, Emacs Elisp, and so on,
should be researched. For example, have there been any backwards
incompatible format changes? If so, why? What forced that change
and how can we try to avoid such embarrassments?
The data created by the bytecode should be stored in a binary format
for speed and compactness. (At least so far) Perl's approach to data
portability issues like integer width, character encoding, and
floating point format has been 'do the what comes naturally for the
current runtime platform', as opposed to Java's strict specification
that e.g. an int is 32 bits, and floats are IEEE 32-bit. There are
benefits to this approach, but I think it's wrong.
Firstly, it transfers the burden of any needed conversions to the VM
author of any given platform away from the authors of the generic VM.
Understandable from the viewpoint of Sun, but probably not for us, we
should do the conversions.
Secondly, it makes things slower if your platform is not identical to
the chosen 'standard': each time your data travels into the bowels of
the VM it needs to be converted, and when it comes back, it needs to
be converted again. An old network programming adage is: 'receiver
makes right'. That is, the sender should just send, and do it fast,
and let the receiver worry about any possibly required conversions.
Never mind the 'network' aspect: the principle goes equally well for
any IO because the latencies of CPU/memory will always be magnitudes
better than the latencies of disk/network. The rationale being that
there is little sense trying to portable in the data you deal with
because you cannot ever anticipate what formats and encoding the other
end of the transaction would like to have. Just tell in what format
and encoding your data is, and then send/write it away.
Whether the pv data is in opaque 8-bit bytes, UTF-8, or some other
encoding, is not discussed further here. The issue in which format
(width and byteorder, floating point format) the numbers are can be
dealt with by having a bytecode declaration block that basically
declares the native formats/encodings to be used. That integer width
and byteorder will be used for example for the above 'length' and
'width' fields. If the formats/encodings haven't been declared, the
bytecodes depending on them should abort immediately.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen