request new Mapping|Hash operators

2007-02-27 Thread Darren Duncan

All,

I believe that there is some room for adding several new convenience 
operators or functions to Perl 6 that are used with Mapping and Hash 
values.


Or getting more to the point, I believe that the need for the 
relational data model concept of a tuple (a tuple where elements 
are addressed by name not position) would be satisfied by the 
existing Perl 6 data types of Mapping (immutable variant) and Hash 
(mutable variant), but that some common relational operations would 
be a lot easier to express if Perl 6 had a few more operators that 
make them concise.


Below I will name some of these operators that, AFAIK, don't exist 
yet in some form; since they are all pure functions, I will use the 
Mapping type in their pseudo-Perl-6 signatures, but Hash versions 
should exist too.  Or specifically, these should be part of the 
Mapping role, so anything that .does Mapping, such as a Hash, does 
them too?  Some of these operators are like those for sets, but 
aren't exactly the same due to plain set ops not working for mappings 
or hashes as a whole.


I want to emphasize that the operator names are those that are used 
in DBMS contexts, but you can of course name them something else in 
order for them to fit better into Perl 6; the importance is having 
some concise way to get the desired semantics.  Also, this 
functionality doesn't have to be with new operators, but could 
utilize existing ones if there is a concise way to do so.  Likewise, 
some could conceivably be macros, if it wouldn't impair performance.


I also want to emphasize that I see this functionality being 
generally useful, and that it shouldn't just be shunted off to a 
third-party module.


1.  join() aka natural_join():

function join of Mapping (Mapping $m1, Mapping $m2) { ... }

	This binary operator is conceptually like a set-union 
operator, in that it derives a Mapping that has all of the distinct 
keys and values of its 2 arguments, assuming any matching keys also 
have matching values.  (Note that matching specifically means that 
=== returns true, or if users get a choice, then that is its default 
meaning.)


	But if there are any matching keys with mismatching values, 
then this is a failure condition (they are incompatible), and the 
function returns undef instead (or fail, though given the anticipated 
use case, undef is more appropriate).  It is only possible for 2 
arguments to be incompatible if they have any keys in common; if they 
have none, the result is guaranteed to be defined/successful.  If the 
2 arguments have all keys in common, they must be equal, and the 
result is also equal to either.


	This join() function is both commutative and associative, and 
can generalize to N arguments.  Any equal arguments are redundant and 
so duplicates can be ignored.  Given 2 or more arguments, each is 
unioned pairwise until 1 remains.  Given 1 argument, the result is 
that argument.  Given zero arguments, the result is a Mapping with 
zero elements.  A zero-element Mapping is its identity value.


	So join() can be used as a reduction operator, with identity 
of the empty Mapping, but that it can return undef (or fail) instead 
if any 2 arguments have the same keys but different associated values.


For examples:

join( { a1, b2 }, { b2, c3 } )
# returns { a1, b2, c3 }
join( { a1, b2 }, { b4, c3 } )
# returns undef
join( { a1, b2 }, { c3, d4 } )
# returns { a1, b2, c3, d4 }
join( { a1, b2 }, { a1 } )
# returns { a1, b2 }
join( { a1 } )
# returns { a1 }
join( { a1 }, {} )
# returns { a1 }
join()
# returns {}

	In practice, if a relation were implemented, say, as a set of 
Mapping, then the relational (natural) join could then be implemented 
sort of like this:


function join of Relation (Relation $r1, Relation $r2) {
return Relation( grep -- $r1.values XjoinX $r2.values );
}

	That is, the relational (natural) join could then simply be 
implemented as a pairwise invocation of the tuple join between every 
tuple in each relation, keeping only the results that are defined.


	In this wider sense, a relational (natural) join is both an 
intersection in one dimension and a union in the other dimension.


	Now, I'm not currently asking for Relation to be implemented 
as a Perl 6 feature (it is actually more complicated than set of 
mapping), but if Mapping|Hash had an operator like I mentioned, it 
would be easier to make one on top of it; moreover, the Mapping|Hash 
could also implement the heading of a relation (a 
name-to-declared-type map), not just its body composed of tuples 
(each being a name-to-value map).


2.  semijoin() aka matching():

function semijoin of Mapping (Mapping $m1, Mapping $m2) { ... }

	This operator is like join() except that it will simply 
return $m1 if the arguments are 

Re: [S09] Whatever indices and shaped arrays

2007-02-27 Thread David Green

On 2/24/07, Jonathan Lang wrote:
In effect, using * as an array of indices gives us the ordinals 
notation that has been requested on occasion: '*[0]' means 'first 
element', '*[1]' means 'second element', '*[-1]' means 'last 
element',
'*[0..2]' means 'first three elements', and so on - and this works 
regardless of what the actual indices are.


Using * that way works, but it still is awkward, which makes me think 
there's something not quite dropping into place yet.  We have the 
notion of keyed indexing via [] and counting/ordinal indexing via 
[*[]], which is rather a mouthful.  So I end up back at one of 
Larry's older ideas, which basically is: [] for counting, {} for keys.


To put a slight twist on it: instead of adding {}-indexing to arrays, 
consider that what makes something an array is that it doesn't have 
keys -- it's a collection of things that you can count through, as 
opposed to a collection that you search through by meaningful 
keys/names/tags/references/etc.  (E.g., consider positional vs. named 
params, and how they naturally map onto an array and a hash 
respectively.)


Now something that is countable doesn't have to have meaningful keys, 
but any keyed collection can be counted through; hence it makes sense 
to give hashes an array-like [] accessor for getting the 
first/last/nth item in the hash.  In fact, this is basically what 
%h.values gives you -- turning the hash values into an array (well, a 
list).  Saying %h[n] would amount to a direct way of saying 
@(%h.values)[n].


This becomes much more handy in P6, because hashes can be ordered. 
(Not that there's anything stopping you from counting through an 
unordered hash; %h[0] is always the first element of %h, you just 
might not know what that is, the same as with %h.values.)  If Perl 
knows how to generate new keys on the fly (say, because your possible 
hash keys were declared as something inc-/dec-rementable), then you 
can even access elements off the ends of your hash (push/unshift).


What about shaped arrays?  A shape means the indices *signify* 
something (if they didn't, you wouldn't care, you'd just start at 
0!).  So they really are *keys*, and thus should use a hash (which 
may not use any hash tables at all, but it's still an associative 
array because it associates meaningful keys with elements).  I'm not 
put off by calling it a hash -- I trust P6 to recognise when I 
declare a hash that is restricted to consecutive int keys, is 
ordered, etc. and to optimise accordingly.


If there are no meaningful lookup keys, if all I can do to get 
through my list is count the items, then an array is called for, and 
it can work in the usual way: start at 0, end at -1.  It is useful to 
be able to count past the ends of an array, and * can do this by 
going beyond the end: *+1, *+2, etc., or before the beginning: *-1, 
*-2, etc.  (This neatly preserves the notion of * as all the 
elements -- *-1 is the position before everything, and *+1 is the 
position after everything else.)



Well, at least this keeps the easy stuff (counting) easy, and the 
barely-harder stuff (keying) possible.  In fact, since hashes would 
always have both views available, nothing is lost; we get ordinals 
for hashes, shaped collections, and ones that you can pass to a sub 
without losing their shape, it solves the problem of distinguishing 
between ordinal vs. funny indices (and the related issues of 
wrap-around), you can count past the edges, and all while preserving 
familiar array behaviour (especially for P5 veterans), the meaning of 
* as everything, and uncluttered syntax.



-David


Re: request new Mapping|Hash operators

2007-02-27 Thread Aaron Crane
Darren Duncan writes:
 I believe that there is some room for adding several new convenience
 operators or functions to Perl 6 that are used with Mapping and Hash
 values.
snip
 I also want to emphasize that I see this functionality being generally
 useful, and that it shouldn't just be shunted off to a third-party
 module.

Um, why not?

Or rather, why do these need to be part of standard Perl 6.0.0?
Even assuming that they are generally useful, the volunteers donating
their time to implement Perl 6.0.0 shouldn't feel compelled to build
every last feature that might be considered generally useful.

(As it happens, I'm not entirely convinced that these operations are
generally useful in the same way as, say, multiplication, or string
concatenation, or cross hypering, but I think that's a side issue.)

I think it would be reasonable for someone who believes that these
operations are generally useful to attempt to write a Perl 6 module that
provides them.  If that effort goes well, maybe the module will be
included in Perl 6.0.0.  If the module can't be written, or can't be
made efficient, that is presumably interesting to the designers of the
language and of its implementation(s).  But I need these operations
does not imply Perl 6.0.0 needs these operations.

-- 
Aaron Crane


request: num16

2007-02-27 Thread Geoffrey Broadwell
I'd like to request that num16 and therefore complex16 be added to S09,
and made optional just as num128 and complex128 are.  The half-sized
floating point type is commonly used in computer graphics and widely
supported by GPUs and High Dynamic Range image formats such as OpenEXR.


-'f




Bit shifts on low-level types

2007-02-27 Thread Geoffrey Broadwell
How are the bitwise shifts defined on low level types?  In particular,
for right shift, does high bit extension or zero fill occur?  Does the
answer depend on whether the low level type is signed or not?

On the flip side, it seems more useful if we have both operators
available for either signed or unsigned types, to avoid having to do
pointless casting just to change the meaning of +.  Perhaps having both
+ and ? operators?  Since coerce to boolean and then right shift is
meaningless, this seems ripe to DWIM.  (For me, DWIM here means + does
high bit extension, ? does zero fill.)


-'f




Rotation ops?

2007-02-27 Thread Geoffrey Broadwell
Does Perl 6 have (bit/string) rotation ops?  Especially bit rotation on
low-level integer types would be Nifty for making some numeric
algorithms cleaner, more self documenting, and potentially faster than
forcing the use of a combination of other bitwise ops to do the same
thing.


-'f




Low-level types and over/underflow

2007-02-27 Thread Geoffrey Broadwell
What happens when a low-level type overflows or underflows?  For
example, if you decrement an int1 twice, what happens?

If you increment a uint8 past 255, do you get:

1. A uint8  with value 0?
2. A uint16 with value 256?
3. A failure?

What about incrementing an int8 past 127?  Do you get:

1. An int8  with value -128?
2. A uint8  with value 128?
3. An int16 with value 128?
4. A failure?

In both cases, I think option 1 is best, but I can see that option 2 in
the signed case might make sense in certain circumstances.  Personally,
I'd prefer to keep option 1 always, because if I want option 2 I should
cast to uint or a larger int first.


-'f




Casting and low-level types

2007-02-27 Thread Geoffrey Broadwell
What happens when you cast between low-level types?  If the source value
is out of range of the destination type, do you get:

1. An exception?
2. Clip to finite range always?
3. Clip to finite range for ints, clip to infinities for nums?
4. Exception when dest is int, clip to infinities when dest is num? 
5. Copy bits that fit from source to dest and reinterpret?

Personally, I think option 1 or option 4 make the most sense for
conversion between int, uint, num, and complex types, while either
option 1 or option 5 make sense for conversion to/from buf types.

Also, when casting from a num type to an int type, is there a way to
specify desired rounding/truncation behavior in a way that allows the
most efficient code under the covers, rather than making a side trip
through Num and Int?


-'f




Compact structs and byte-stringification

2007-02-27 Thread Geoffrey Broadwell
How do you specify that you want to byte-stringify a compact struct,
rather than normal stringify it?

Does the byte-stringified version include internal and/or trailing
alignment padding?  How do you specify the other choices?

Whether or not trailing padding is included when byte-stringifying a
single compact struct, is the choice the same when byte-stringifying an
array of same?  In other words, are you guaranteed that the
byte-stringify of an array of compact structs is merely the
concatenation of the byte-stringification of each struct?


-'f




Re: request new Mapping|Hash operators

2007-02-27 Thread Nicholas Clark
On Tue, Feb 27, 2007 at 12:18:20AM -0800, Darren Duncan wrote:

 4.  rename():
 
   function rename of Mapping (Mapping $m, Str $old_k, Str $new_k) { 
   ... }
 
   This operator takes one Mapping argument and derives another 

rename is a Perl 5 builtin. I didn't think that it had been dropped for
Perl 6.

Nicholas Clark


Expressions with mixed types including low-level types

2007-02-27 Thread Geoffrey Broadwell
How is casting and coersion handled with expressions involving mixed low
and high level types?

For example, what is the output of this?

my Int  $ten = 10;
my int4 $a   = 0;
my int4 $b;

$b = ($a + 2.4 * $ten) / 4;
say $b;

The answers to the above questions may alter my view on the proper
handling of overflow during casting/coersion.


-'f




Re: Expressions with mixed types including low-level types

2007-02-27 Thread Geoffrey Broadwell
On Tue, 2007-02-27 at 09:20 -0800, Geoffrey Broadwell wrote:
 How is casting and coersion handled with expressions involving mixed low
 and high level types?
 
 For example, what is the output of this?
 
 my Int  $ten = 10;
 my int4 $a   = 0;
 my int4 $b;
 
 $b = ($a + 2.4 * $ten) / 4;
 say $b;
 
 The answers to the above questions may alter my view on the proper
 handling of overflow during casting/coersion.

And for those who think the above code is too easy -- and I can see at
least 1, 2, 5, and 6 as defensible answers -- try 2.8 instead of 2.4.


-'f




Re: request new Mapping|Hash operators

2007-02-27 Thread Smylers
Darren Duncan writes:

 I believe that ... some common relational operations would be a lot
 easier to express if Perl 6 had a few more operators that make them
 concise.

I am prepared to believe that.  But what I'm unclear on is when I'd want
to perform a common relational operation.

Please could you give an example of something which is useful -- that is
useful as a means to some other end, not merely useful to somebody who
has an interest in relational theory -- but which is currently awkward,
and then give the same example again showing how much better it is with
your proposed functions?

 I also want to emphasize that I see this functionality being generally
 useful, and that it shouldn't just be shunted off to a third-party
 module.

Why is being in a module being shunted off?  You could put everything
in the main namespace but that way PHP, ahem I mean madness, lies.

Nicholas already pointed out that in Perl 5 Crename exists, as an
operation on files.  That shows the problem with using generic function
names for quite specific operations without there being any surrounding
context.  Many people rarely use Crename, because they happen to be
using Perl for things other than dealing the filesystem, but the
existence of that function clobbers a useful name.

Rather than fighting over it it strikes me as much more sensible to have
a module for filesystem operations and another for relational
operations, then users can import the functions that they actually use.

Note that being in a module doesn't (necessarily) mean 'not distributed
with core Perl'.

 1.  join() aka natural_join():

Remember that Perl already has a Cjoin function, for joining strings.

Smylers


Re: Bit shifts on low-level types

2007-02-27 Thread Smylers
Geoffrey Broadwell writes:

 How are the bitwise shifts defined on low level types?  In particular,
 for right shift, does high bit extension or zero fill occur?  Does the
 answer depend on whether the low level type is signed or not?
 
 On the flip side, it seems more useful if we have both operators
 available ...

Deal with anything as low-level as bits seems to be very rare in Perl 5
programming.

Introducing more operators to the core language, especially terse
punctuationy ones, for something rarely used strikes me as a way of
making the documentation fatter and raising the barrier to entry for
little benefit.

 Perhaps having both + and ? operators?  Since coerce to boolean and
 then right shift is meaningless, ...

It's useless, rather than meaningless; you've neatly defined what the
meaning of that (useless) operator would be.

That is, at the moment there are consistent rules for being able to
correctly guess the meaning of an operator based on knowledge of other
operators.  Your suggestion would break that; just because some
combination of symbols doesn't currently have a use doesn't mean that it
makes sense to appropriate them for something else.

 this seems ripe to DWIM.

But DWIM is the meaning you previously defined, surely?

 (For me, DWIM here means + does high bit extension, ? does zero
 fill.)

Why?  You think that somebody not knowing about this operator would
correctly infer its existence from other operators?  Even if somebody
guessed that both operators exist it looks pretty arbitrary which is
which.

For this esoteric sort of stuff can't we have named operators (short
names if you like, perhaps taken from assembly language), in a module
that can be loaded by those who need them?

Smylers


Re: Bit shifts on low-level types

2007-02-27 Thread Nicholas Clark
On Tue, Feb 27, 2007 at 06:31:31PM +, Smylers wrote:
 Geoffrey Broadwell writes:
 
  How are the bitwise shifts defined on low level types?  In particular,
  for right shift, does high bit extension or zero fill occur?  Does the
  answer depend on whether the low level type is signed or not?
  
  On the flip side, it seems more useful if we have both operators
  available ...
 
 Deal with anything as low-level as bits seems to be very rare in Perl 5
 programming.

It's one of the things that Perl 5 is bad at. Not beacuse it can't do it,
but because it's terribly terribly slow (compared with C)

You don't want to write a linker in Perl.

 For this esoteric sort of stuff can't we have named operators (short
 names if you like, perhaps taken from assembly language), in a module
 that can be loaded by those who need them?

I think that we can learn from PHP here. :-)

Nicholas Clark


Re: Bit shifts on low-level types

2007-02-27 Thread John Macdonald
On Tue, Feb 27, 2007 at 06:31:31PM +, Smylers wrote:
 Geoffrey Broadwell writes:
 
  Perhaps having both + and ? operators?  Since coerce to boolean and
  then right shift is meaningless, ...
 
 It's useless, rather than meaningless; you've neatly defined what the
 meaning of that (useless) operator would be.
 
[ ... ]
 
  this seems ripe to DWIM.
 
 But DWIM is the meaning you previously defined, surely?
 
  (For me, DWIM here means + does high bit extension, ? does zero
  fill.)
 
 Why?  You think that somebody not knowing about this operator would
 correctly infer its existence from other operators?  Even if somebody
 guessed that both operators exist it looks pretty arbitrary which is
 which.

While I tend somewhat to agree that this level of bit
manipulation is not common enough to justify warping the
language; I disagree that the choice of meaning between +
and ? is arbitrary and not subject to inference.  The normal
assembler opcodes for the two forms of right shift are LRS
(logical right shift) and ARS (arithmetic right shift) with some
variation in spelling for different hardware architectures.
The arithmetic variant propagates the sign bit; the boolean
variant inserts zeros.  A sign bit is an integer property
that has no meaning in boolean context.  It would be hard to
find any rationale for reversing the meaning of the two.

-- 


Re: [S09] Whatever indices and shaped arrays

2007-02-27 Thread Jonathan Lang

David Green wrote:

On 2/24/07, Jonathan Lang wrote:
In effect, using * as an array of indices gives us the ordinals
notation that has been requested on occasion: '*[0]' means 'first
element', '*[1]' means 'second element', '*[-1]' means 'last
element',
'*[0..2]' means 'first three elements', and so on - and this works
regardless of what the actual indices are.

Using * that way works, but it still is awkward, which makes me think
there's something not quite dropping into place yet.  We have the
notion of keyed indexing via [] and counting/ordinal indexing via
[*[]], which is rather a mouthful.  So I end up back at one of
Larry's older ideas, which basically is: [] for counting, {} for keys.


What if you want to mix the two?  I want the third element of row 5.
In my proposal, that would be @array[5, *[2]]; in your proposal,
there does not appear to be a way to do it.

Unless the two approaches aren't mutually exclusive: @array{5,
*[2]}.  That is, allow subscripted Whatevers within curly braces for
to enable the mixing of ordinals and keys.  Since this is an unlikely
situation, the fact that nesting square braces inside curly braces is
a bit uncomfortable isn't a problem: this is a case of making hard
things possible, not making easy things easy.


What about shaped arrays?  A shape means the indices *signify*
something (if they didn't, you wouldn't care, you'd just start at
0!).  So they really are *keys*, and thus should use a hash (which
may not use any hash tables at all, but it's still an associative
array because it associates meaningful keys with elements).  I'm not
put off by calling it a hash -- I trust P6 to recognise when I
declare a hash that is restricted to consecutive int keys, is
ordered, etc. and to optimise accordingly.


The one gotcha that I see here is with the possibility of
multi-dimensional arrays.  In particular, should multi-dimensional
indices be allowed inside square braces?  My gut instinct is yes;
conceptually, the third row of the fourth column is perfectly
reasonable terminology to use.  The thing that would distinguish []
from {} would be a promise to always use zero-based, consecutive
integers as your indices, however many dimensions you specify.  With
that promise, you can always guarantee that the wrap-around semantics
will work inside [], while nobody will expect them to work inside {}.

In short, the distinction being made here isn't unshaped vs.
shaped; it's ordinal indices vs. named indices, or ordinals
vs. keys.

That said, note that - in the current conception, at least - one of
the defining features of a shaped array is that trying to access
anything outside of the shape will cause an exception.  How would
shapes work with the ordinals-and-keys paradigm?

First: Ordinals have some severe restrictions on how they can be
shaped, as specified above.  The only degrees of freedom you have are
how many dimensions are allowed and, for each dimension, how many
ordinals are permitted.  Well, also the value type (although the key
type is fixed as Int where 0..*.  So you could say something like:

 my @array[2, 3, *]

...which would mean that the array must be three-dimensional; that the
first dimension is allowed two ordinals, the second is allowed three,
and the third is allowed any number of them - i.e., 'my @array[^2; ^3;
0..*]' in the current syntax.  Or you could say:

 my @array[2, **, 2]

...meaning that you can have any number of dimensions, but the first
and the last would be constrained to two ordinals each: 'my @array[^2;
**; ^2]'.

Note the use of commas above.  Since each dimension can only take a
single value (a non-negative integer), there's no reason to use a
multidimensional list to define the shape.  Personally, I like this
approach: it strikes me as being refreshingly uncluttered.

Furthermore, you could do away with the notion of shaped vs.
unshaped: just give everything a default shape.  The default shape
for arrays would be '[*]' - that is, one dimension with an
indeterminate number of ordinals.

Meanwhile, shapes for {} would continue to use the current syntax.
'[$x, $y, $z]' would be nearly equivalent to '{0..^$x; 0..^$y;
0..^$z}'.


If there are no meaningful lookup keys, if all I can do to get
through my list is count the items, then an array is called for, and
it can work in the usual way: start at 0, end at -1.  It is useful to
be able to count past the ends of an array, and * can do this by
going beyond the end: *+1, *+2, etc., or before the beginning: *-1,
*-2, etc.  (This neatly preserves the notion of * as all the
elements -- *-1 is the position before everything, and *+1 is the
position after everything else.)


Regardless, I would prefer this notion to the offset from the
endpoint notion currently in use.  Note, however, that [*-1] wouldn't
work in the ordinals paradigm; there simply is nothing before the
first element.  About the only use I could see for it would be to
provide an assignment equivalent of unshift: '@array[*-1] = $x'
could 

[svn:perl6-synopsis] r13706 - doc/trunk/design/syn

2007-02-27 Thread larry
Author: larry
Date: Tue Feb 27 15:41:10 2007
New Revision: 13706

Modified:
   doc/trunk/design/syn/S09.pod

Log:
Some clarifications requested by Geoffrey Broadwell++.


Modified: doc/trunk/design/syn/S09.pod
==
--- doc/trunk/design/syn/S09.pod(original)
+++ doc/trunk/design/syn/S09.podTue Feb 27 15:41:10 2007
@@ -12,9 +12,9 @@
 
   Maintainer: Larry Wall [EMAIL PROTECTED]
   Date: 13 Sep 2004
-  Last Modified: 30 Jan 2007
+  Last Modified: 27 Jan 2007
   Number: 9
-  Version: 16
+  Version: 17
 
 =head1 Overview
 
@@ -50,10 +50,12 @@
 uint32
 uint64
 
+num16
 num32
 num64   (aka num on most architectures)
 num128
 
+complex16
 complex32
 complex64   (aka complex on most architectures)
 complex128
@@ -79,23 +81,70 @@
 associate additional names, such as short or single.  These are
 not provided by default.  An implementation of Perl is not required
 to support 64-bit integer types or 128-bit floating-point types unless
-the underlying architecture supports them.
+the underlying architecture supports them.  16-bit floating-point is
+also considered optional in this sense.
 
 And yes, an Cint1 can store only -1 or 0.  I'm sure someone'll think of
 a use for it...
 
+Note that these are primarily intended to represent storage types;
+the compiler is generally free to keep all intermediate results in
+wider types in the absence of declarations or explicit casts to the
+contrary.  Attempts to store an intermediate result in a location
+that cannot hold it will generally produce a warning on overflow.
+Underflow may also warn depending on the pragmatic context and use
+of explicit rounding operators.  The default rounding mode from
+CNum to CInt is to truncate the fractional part without warning.
+(Note that warnings are by definition resumable exceptions; however,
+an exception handler is free to either transform such a warning into
+a fatal exception or ignore it completely.)
+
+An explicit cast to a storage type has the same potential to throw an
+exception as the actual attempt to store to such a storage location
+would.
+
+With IEEE floating-point types, we have a bias towards the use
+of in-band C+Inf, C-Inf, and CNaN values in preference to
+throwing an exception, since this is construed as friendlier to vector
+processing and pipelining.  Object types such as CNum and CInt
+may store additional information about the nature of the failure,
+perhaps as an unthrown exception or warning.
+
 =head1 Compact structs
 
-A class whose attributes are all low-level types can behave as
+A class whose attributes are all low-level value types can behave as
 a struct.  (Access from outside the class is still only through
-accessors, though.)  Whether such a class is actually stored compactly
-is up to the implementation, but it ought to behave that way,
-at least to the extent that it's trivially easy (from the user's
-perspective) to read and write to the equivalent C structure.
-That is, when byte-stringified, it should look like the C struct,
+accessors, though, except when the address of a serialized version of
+the object is used or generated for interfacing to C-like languages.)
+Whether such a class is actually stored compactly is up to the
+implementation, but it ought to behave that way, at least to the
+extent that it's trivially easy (from the user's perspective) to read
+and write to the equivalent C structure.  That is, when serialized
+or deserialized to the C view, it should look like the C struct,
 even if that's not how it's actually represented inside the class.
 (This is to be construed as a substitute for at least some of the
-current uses of Cpack/Cunpack.)
+current uses of Cpack/Cunpack.)  Of course, a lazy implementation will
+probably find it easiest just to keep the object in its serialized form
+all the time.  In particular, an array of compact structs must be stored
+in their serialized form (see next section).
+
+For types that exist in the C programming language, the serialized
+mapping in memory should follow the same alignment and padding
+rules by default.  Integers smaller than a byte are packed into a
+power-of-two number of bits, so a byte holds four 2-bit integers.
+Datum sizes that are not a power of two bits are not supported
+unless declared by the user with sufficient information to determine
+how to lay them out in memory, possibly with a pack/unpack format
+associated with the class, or with the strange elements of the class,
+or with the types under which the strange element is declared.
+
+Note that a compact struct is itself a value type, so except for
+performance considerations, it doesn't matter how many representations
+of it there are in memory as long as those are consistent.
+
+The packing serialization is performed by coercion to an appropriate
+buffer type.  The unpacking is performed by coercion of such a buffer
+type back to the 

Re: request new Mapping|Hash operators

2007-02-27 Thread Darren Duncan

At 4:45 PM + 2/27/07, Nicholas Clark wrote:

  4.  rename():

rename is a Perl 5 builtin. I didn't think that it had been dropped for
Perl 6.


At 6:22 PM + 2/27/07, Smylers wrote:

  1.  join() aka natural_join():

Remember that Perl already has a Cjoin function, for joining strings.


To both of these comments, first I want to repeat that we don't have 
to use the names I provided if there are other names or syntax that 
would work better.


As for join(), it already has multiple meanings in Perl.

Not only is join() used for joining strings, but also for joining threads.

Regardless, I see this situation as being similar to Dog.bark() vs 
Tree.bark(); the operators I described take a Mapping or Hash as 
their primary argument, while any other join() or rename() do not, so 
they are very easy to distinguish using normal multi semantics, and 
they don't even look the same visually.


But once again, the functions|operators can have different names.

At 6:22 PM + 2/27/07, Smylers wrote:

Darren Duncan writes:
  I believe that ... some common relational operations would be a lot

 easier to express if Perl 6 had a few more operators that make them
 concise.


I am prepared to believe that.  But what I'm unclear on is when I'd want
to perform a common relational operation.

Please could you give an example of something which is useful -- that is
useful as a means to some other end, not merely useful to somebody who
has an interest in relational theory -- but which is currently awkward,
and then give the same example again showing how much better it is with
your proposed functions?


At 12:51 PM + 2/27/07, Aaron Crane wrote:

(As it happens, I'm not entirely convinced that these operations are
generally useful in the same way as, say, multiplication, or string
concatenation, or cross hypering, but I think that's a side issue.)


I would say that relational operations, in usefulness, place around 
the order of cross hypering and/or set operations, or just at the 
next level down.


In functionality, I see relational operations as being like slightly 
more complicated set operations, in that each set element has 
multiple significant parts and the set-like operations can be looking 
at just parts of each element rather than the whole element when 
querying membership, and that the elements of derived sets can have 
different elements than either of the set operation arguments.  It is 
convenient for elements to be represented using Mappings|Hashes.


One common usage scenario, of relational-join in particular, is doing 
operations on tabular data, where you want to know something that 
involves matching up columns in several tables.  For example, say you 
have data tables {suppliers,foods,shipments} and you want to know 
what suppliers, along with their countries, that you have received 
orange-coloured foods from.  A country of residence is an attribute 
of a supplier, and color is an attribute of a part.


Your data, which could come from anywhere, could look like this:

  $suppliers = Set(
{ farm'Hodgesons', country'Canada' },
{ farm'Beckers', country'England' },
{ farm'Wickets', country'Canada' },
  );

  $foods = Set(
{ food'Bananas', colour'yellow' },
{ food'Carrots', colour'orange' },
{ food'Oranges', colour'orange' },
{ food'Kiwis', colour'green' },
{ food'Lemons', colour'yellow' },
  );

  $shipments = Set(
{ farm'Hodgesons', food'Kiwis', qty100 },
{ farm'Hodgesons', food'Lemons', qty130 },
{ farm'Hodgesons', food'Oranges', qty10 },
{ farm'Hodgesons', food'Carrots', qty50 },
{ farm'Beckers', food'Carrots', qty90 },
{ farm'Beckers', food'Bananas', qty120 },
{ farm'Wickets', food'Lemons', qty30 },
  );

If the join() and semijoin() operators that I described existed, then 
the query could look like this:


  $supp_of_oran_food = Set( $suppliers.values XsemijoinX 
($shipments.values XjoinX $foods.values join { colour'orange' 
}) );


Or if higher-level operators were made that worked on entire sets of 
mappings, or relations (which can have multiple indexes) instead, the 
above query could look more like this instead:


  $supp_of_oran_food = $suppliers semijoin ($shipments join $foods 
join Set( { colour'orange' } ) );


The result is then:

  Set(
{ farm'Hodgesons', country'Canada' },
{ farm'Beckers', country'England' },
  );

Without any join etc operators, you would have to explicitly iterate 
over each element of each Mapping|Hash and do comparisons between 
keys and values|elements, which is considerably more verbose.


In conclusion, I consider functionality like relational-join to 
provide considerable conciseness to very common data processing 
operations, which given their nature, would likely get a speed 
benefit from being implemented at the same low level that set 
operations in general are.


And the operators could have different names if necessary.

Pardon me if I missed addressing some part of an 

Re: Low-level types and over/underflow

2007-02-27 Thread Darren Duncan

At 6:15 AM -0800 2/27/07, Geoffrey Broadwell wrote:

What happens when a low-level type overflows or underflows?  For
example, if you decrement an int1 twice, what happens?

If you increment a uint8 past 255, do you get:

1. A uint8  with value 0?
2. A uint16 with value 256?
3. A failure?

What about incrementing an int8 past 127?  Do you get:

1. An int8  with value -128?
2. A uint8  with value 128?
3. An int16 with value 128?
4. A failure?

In both cases, I think option 1 is best, but I can see that option 2 in
the signed case might make sense in certain circumstances.  Personally,
I'd prefer to keep option 1 always, because if I want option 2 I should
cast to uint or a larger int first.


I believe that the answer to this is whatever the underlying 
hardware does.  These lowercased types are unboxed and directly 
reflect their CPU native equivalents, AFAIK.  That said, I suspect 
that either a wraparound or an overflow code is what you'd get, and 
not a type upgrade. -- Darren Duncan


[svn:perl6-synopsis] r13707 - doc/trunk/design/syn

2007-02-27 Thread larry
Author: larry
Date: Tue Feb 27 15:56:44 2007
New Revision: 13707

Modified:
   doc/trunk/design/syn/S03.pod

Log:
Modifiers on bit shifts.


Modified: doc/trunk/design/syn/S03.pod
==
--- doc/trunk/design/syn/S03.pod(original)
+++ doc/trunk/design/syn/S03.podTue Feb 27 15:56:44 2007
@@ -12,9 +12,9 @@
 
   Maintainer: Larry Wall [EMAIL PROTECTED]
   Date: 8 Mar 2004
-  Last Modified: 21 Feb 2007
+  Last Modified: 27 Feb 2007
   Number: 3
-  Version: 101
+  Version: 102
 
 =head1 Overview
 
@@ -540,6 +540,9 @@
 
 +
 
+By default, signed types do sign extension, while unsigned types do not, but
+this may be enabled or disabled with a C:signed or C:!signed adverb.
+
 =item *
 
 infix:~, string bitwise and
@@ -548,16 +551,19 @@
 
 =item *
 
-infix:{'~'}, string bitwise shift left
+infix:{'~'}, buffer bitwise shift left
 
 ~
 
 =item *
 
-infix:{'~'}, string bitwise shift right
+infix:{'~'}, buffer bitwise shift right
 
 ~
 
+Sign extension is not done by default but may be enabled with a C:signed
+adverb.
+
 =item *
 
 infix:?, boolean bitwise and
@@ -566,6 +572,10 @@
 
 =back
 
+Any bit shift operator may be turned into a rotate operator with the
+C:rotate adverb.  If C:rotate is specified, the default is
+unsigned, and you may not explicitly specify a C:signed adverb.
+
 =head2 Additive precedence
 
 =over


Re: Expressions with mixed types including low-level types

2007-02-27 Thread Larry Wall
On Tue, Feb 27, 2007 at 09:35:49AM -0800, Geoffrey Broadwell wrote:
: On Tue, 2007-02-27 at 09:20 -0800, Geoffrey Broadwell wrote:
:  How is casting and coersion handled with expressions involving mixed low
:  and high level types?
:  
:  For example, what is the output of this?
:  
:  my Int  $ten = 10;
:  my int4 $a   = 0;
:  my int4 $b;
:  
:  $b = ($a + 2.4 * $ten) / 4;
:  say $b;
:  
:  The answers to the above questions may alter my view on the proper
:  handling of overflow during casting/coersion.
: 
: And for those who think the above code is too easy -- and I can see at
: least 1, 2, 5, and 6 as defensible answers -- try 2.8 instead of 2.4.

I think either 5 or 6 is correct.  See the recent S09 update.

Larry


Re: Compact structs and byte-stringification

2007-02-27 Thread Larry Wall
On Tue, Feb 27, 2007 at 06:54:50AM -0800, Geoffrey Broadwell wrote:
: How do you specify that you want to byte-stringify a compact struct,
: rather than normal stringify it?

Coerce to a buffer type rather than using ~.

: Does the byte-stringified version include internal and/or trailing
: alignment padding?  How do you specify the other choices?

By default it's as like C as possible.  Other choices would have to act
something like pack templates and be hung on some appropriate declaration.

: Whether or not trailing padding is included when byte-stringifying a
: single compact struct, is the choice the same when byte-stringifying an
: array of same?  In other words, are you guaranteed that the
: byte-stringify of an array of compact structs is merely the
: concatenation of the byte-stringification of each struct?

Yes.

Larry


Re: Casting and low-level types

2007-02-27 Thread Larry Wall
On Tue, Feb 27, 2007 at 06:26:18AM -0800, Geoffrey Broadwell wrote:
: What happens when you cast between low-level types?  If the source value
: is out of range of the destination type, do you get:
: 
: 1. An exception?
: 2. Clip to finite range always?
: 3. Clip to finite range for ints, clip to infinities for nums?
: 4. Exception when dest is int, clip to infinities when dest is num? 
: 5. Copy bits that fit from source to dest and reinterpret?
: 
: Personally, I think option 1 or option 4 make the most sense for
: conversion between int, uint, num, and complex types, while either
: option 1 or option 5 make sense for conversion to/from buf types.

Basically it's 4, except that the exception is a warning (which is
a form of resumable exception in Perl 6.)

: Also, when casting from a num type to an int type, is there a way to
: specify desired rounding/truncation behavior in a way that allows the
: most efficient code under the covers, rather than making a side trip
: through Num and Int?

Depends on what you call to perform the rounding.  Could be anything
from a macro to a multimethod.  The default round() is presumably a
multimethod on Num that produces an Int, but you can always define
more specific multis or functions or macros, or whack the compiler
upside the head with a pragma.

Larry


[svn:perl6-synopsis] r13708 - doc/trunk/design/syn

2007-02-27 Thread larry
Author: larry
Date: Tue Feb 27 17:00:09 2007
New Revision: 13708

Modified:
   doc/trunk/design/syn/S03.pod

Log:
Better writing requested by John Macdonald++


Modified: doc/trunk/design/syn/S03.pod
==
--- doc/trunk/design/syn/S03.pod(original)
+++ doc/trunk/design/syn/S03.podTue Feb 27 17:00:09 2007
@@ -573,8 +573,8 @@
 =back
 
 Any bit shift operator may be turned into a rotate operator with the
-C:rotate adverb.  If C:rotate is specified, the default is
-unsigned, and you may not explicitly specify a C:signed adverb.
+C:rotate adverb.  If C:rotate is specified, the concept of
+sign extenstion is meaningless, and you may not specify a C:signed adverb.
 
 =head2 Additive precedence
 


[svn:perl6-synopsis] r13709 - doc/trunk/design/syn

2007-02-27 Thread larry
Author: larry
Date: Tue Feb 27 17:01:23 2007
New Revision: 13709

Modified:
   doc/trunk/design/syn/S03.pod

Log:
gah


Modified: doc/trunk/design/syn/S03.pod
==
--- doc/trunk/design/syn/S03.pod(original)
+++ doc/trunk/design/syn/S03.podTue Feb 27 17:01:23 2007
@@ -574,7 +574,7 @@
 
 Any bit shift operator may be turned into a rotate operator with the
 C:rotate adverb.  If C:rotate is specified, the concept of
-sign extenstion is meaningless, and you may not specify a C:signed adverb.
+sign extension is meaningless, and you may not specify a C:signed adverb.
 
 =head2 Additive precedence