Re: RFC 268 (v1) Keyed arrays

Glenn Linderman Thu, 21 Sep 2000 22:57:59 -0700
Michael Maraist wrote:

> >     my/our @array :hashsyntax;
> >
> > would hide the definition of %array in the same way that
> >
> >     my/our %array
> >
> > would  hide a  prior definition  of %array.   And references  to %array
> > would thenceforth actually be references to the keyed array @array.
>
> I can see massive confusion from this.

OK, this sounds like a vote against the :hashsyntax variation.  That variation
was a stretch, meant to help speed up accessor functions without requiring a
syntax change.  Speeding up accessor functions is what inspired the idea (RFC
163, oops, I forgot that one in my References list, just added it).

> It's bad enough that globbing allows
> multiple data-types with the same name,

Could you elucidate here?  Oh, type globs.  Oh you mean $foo vs @foo vs %foo, I
think.  OK, correct me if I'm wrong, but I think I'm with you.

> and it's really bad that the casual
> observer has to stretch their mind like crazy to interpret @$sym{qw(a b c)},
> which non-intuitively changes context every character (scalar to ref to
> array ref to hash-slice ref, etc ).

Yes, that is not an expression for a casual observer.  But it is explicit, once
you know the rules.

> Then came pseudo-hashes, which
> thankfully aren't used for too much other than objects (where there true
> natures are somewhat hidden).

And hopefully pseudo-hashes can be eliminated.

> Now you're suggesting colliding globing name-spaces.

But only for those variables for which the user requests it.

> What I think you're saying is that hashes and arrays would be totally merged
> into a singular data-type.

Not.  Arrays and hashes would still both exist.  Hashes would remain identical
to today, but would be unencumbered by pseudo-hash code (hopefully, RFC 241).
Arrays (as implemented today) would be exactly available via :nokey.  In fact,
until the user actually uses a key with a :key array, it is no different than a
:nokey array.  It is for that reason that I proposed making :key the default: it
would not alter the semantics of any existing program under perl<6.  But after
reading your comments, I've realized that splice must be prohibited on :key
arrays, so I'm changing the default in v2 to :nokey, which will still support
splice.

> That might work, but we'd be alienating existing
> users that don't want to retrain but work along side perl6 developers.  I'm
> not saying it's bad, just that I know it'll affect my work environment.

So I don't think this conclusion is correctly drawn, as both arrays and hashes
would still exist.  I've rephrased by abstract to try to avoid misleading people
regarding the continued existance of hashes.  Does that help?

> > The syntaxes
> >
> >     $foo['element']
> >     $foo{element]
>
> Typo.  Which was it $foo{element} or $foo[element]?

Thank you.  It was intended to be (and will be in v2) $foo[element].

> > So, starting with
> >
> >    my @foo:key; # empty array
> >    $foo ['month'] = 10;  #  $#foo == 1, $foo[0] == 10
> >    $foo ['day'] = 20;   # $#foo == 2, $foo [1] == 20
> >    $foo ['year'] = 30;   # $#foo = 3, $foo [2] == 30
> > We achieve an array with 3 elements.  There is a clear parallel between
> > this and
> >
> >    my %foo;
> >    $foo{'month'} = 10;
> >    $foo{'day'} = 20;
> >    $foo{'year'} = 30;
> >
> > However, the lookups for @foo are done at compile time, the lookups for
> > %foo are done at runtime.
>
> Ok, implementation problem.  Are you suggesting that we really implement
> pseudo-hashes behind the scene?  You mention name-spaces, but afaik, these
> are just normal perl-hashes (with special values).  This sounds more like
> using pseudo-hashes behind the scenes.

Not quite pseudo-hashes, but similar in some ways to pseudo hashes.  That's why
Mr. Schwern and I had an extensive discussion in the -object list (eventually
moved off-line, but the discussion started there re: RFC 163) regarding just
exactly what I was proposing here, and culminating in the quote that I placed at
the top of this RFCs discussion section.  It appears that Mr. Schwern is
somewhat violently opposed to pseudo-hashes continued existance.  But after much
discussion, he realized that this proposal is different.  He encouraged me to
work up something using a tied array, but then I saw something about a new-RFC
deadline, so decided to submit the ideas so far, before I did that.  So
certainly it was premature.

Indeed, the name space is defined to be a _normal_ perl hash in the
implementation section.

> The difference is that now you're
> requiring a new symbolic hash to be generated for each ":key" array,
> where-as pseudo-hashes allow you to reuse the same old hash (as used in
> perl5.005's OO).

That's one difference, I guess.  Certainly, when you say

@foo = @bar;

the new namespace hash for foo could be a pointer to the same namespace hash as
is used for bar, if it makes sense to reference count and copy on write, or
something along those lines.  That's an implementation choice, of course.  The
logical idea is as you say, that each keyed object might have its own namespace
hash.

> As an alternative, use an attribute that does not interpolate the field
> name.

I'm not sure what you mean here.

> Make it more like a c-struct, where you _have_ to fully specify the
> field name.  With this, you could still use the dynamic generation of fields
> as in the above (so you don't have to specify them ahead of time), but by
> the time the compiler is finished, the references are fixed and converted to
> array indexes.

Maybe you could elaborate more on your idea here, so that I can understand it.
I'll say a few words about what I think I proposed, so that you can tell me if
what you are saying is the same or different.  My idea is that for this type of
keyed array, the common usage would be by constant field name.  When the
compiler sees a constant field name, it can look it up at compile time, and emit
code using the index.  At least that is the intention.  I guess that means that
the compiler has to know or see the complete set of fields being used with the
array, and that conflicts with the ability to willy-nilly copy these things
around from one variable to another, and still allow the compiler to emit code
using the index.

> Moreover, you have a special type of fixed array (as opposed
> to a resizable one).  You get all the benifits of space / performance of
> c-structures, while not totally throwing away the flexibility of hashes.
> The only way you could achieve a dynamic field-name lookup would be through
> an eval.  This slows it's usage down, but if you really wanted to use
> hashes, you'd use some other variable-name attribute.  Actually, thinking
> more on that, you couldn't totally fix the array size if you could 'eval'
> the structure with additional field-names.  The most you could say would be
> that you have a unidirectionally growing array, and it might not be worth
> the while of producing a whole new array structure for just this sort of
> optimization.
>
> >
> > For :key and :initialkey arrays, the syntax
> >
> >     $foo[$bar]
> >
> > would inspect $bar to determine if it is convertable to numeric.  If it
> > is, the value is used as the numeric index of the array.  If it is not,
> > it is treated as a key for the array, and is looked up in the namespace
> > of the array.
>
> Doesn't this conflict with the general usage of hashes?

It would, but the :keyonly and :hashsyntax forms, with which index usage is not
allowed, also allows numeric looking keys.  It is just :key and :initialkey that
restricts you to non-numeric keys.

> I know I've made
> use of hashes as a form of sparce matrix / array.  DB ID's, for example
> (where you have 3 sequential values starting at 1,234,567,890 ).  Basing
> context on whether the number 'happens' to be numeric is bad (at least for
> me in DB programming).  I don't like the idea of coersing it to a string as
> in $foo[ "$bar" ].

The idea in that last sentence wouldn't even work under this proposal.  You'd
have to use
something like $foo["k$bar"].  But hashes, and :keyonly and :hashsyntax would
all permit you to not care about the content of the key.

> Also, if you ever mixed keys / indexes, you'd have a serious problem with
> code maintanance.  Try the following:

...

> So, either need to never numerically access index fields, or define
> positional fields with the variable declaration.  Anything else will
> fundamentally hurt the stability of perl-code.  I don't mind dynamic this or
> that, so long as a lazy developer's module doesn't hurt my code any more
> than it has to.

Yes, mixture of keys and indexes would be confusing in the presence of
shift/unshift.  Probably not a good idea to mix them, if shift/unshift is used.

> Also, is it sane to suggest that I could do the following?
> my @hash: bla bla bla;
> $hash[ 100 ]  = ...
> $hash{ field1 }  = ...
> $hash{ field2 }  = ...
>
> In general, intermixing positional parameters and a dynamically assigned
> hash is asking for trouble.

Yes, as mentioned above.  You don't run into problems mixing them, until they
move around.  Should I restrict shift/unshift from being used (and eliminate
offset from the implementation), or is it good enough recommend against the
practice, and let the programmer beware.

> >    my ( @stat_array ) = stat ( $filename );
> >    print "File $filename has a size of $stat_array[size] bytes.\n";
>
> This could still work with a behind-the scenes pseudo-hash.  So long as stat
> defined the structure.
> Of course, you'd have to change the definition of stat to use want-array and
> perform the following:
> my $rh_stats = stat( .. )
> print "File $filename has a size of $rh_stats->{size} bytes.\n";

My example is bad, now that I've switched the default to :nokey.  If this read

  my ( @stat_array :key ) = stat ( $filename );
  print "File $filename has a size of $stat_array[size] bytes.\n";

would that be better?  The stat function would be expected to be able to use
want-array (which would have to be extended to know about keyed arrays in
addition to :nokey arrays) and return the keys along with the values.

But it is still not clear that the compiler could infer what "size" means for
@stat_array, because the keys would be defined inside the stat function, not in
the declaration of @stat_array.  So that could hardly be compiled to a constant
reference unless the compiler starts doing some powerful inferencing (I mean,
that two line example is simple, but not all examples would be so simple.  Throw
a few function calls between, and who's to say what's in @stat_array?

I'm going to have to think about that one for a while.

> Obviously, the dereferencing is annoying.  I'm sure there have been many
> discussions on pro's and cons of pseudo-hashes, so I won't officially
> suggest that you can actually hide the first index of a pseudo-hash, or even
> that you can make a real hash out of it as in:
>
> my %fast_hash: keys( size name ... );
> or (or indirectly via the use of the attributes module).  I'll read through
> this newsgroup to see if I can find out more about this style since it's
> starting to attract me.
>
> I do, however, like the suggestion that positionally significant parameters
> should universally make use of what-ever optimized hash-like interface
> finally gets adopted by perl6.
>
> As a general comment, one of the benifits of using context modifiers like %,
> $, @, [ and { was that a developer could look at an exported variables and
> generally figure out what was going on.  If we use attributes to define the
> characteristics of a variable, then we're looking at a maintanance
> nightmare.

I started this idea thinking that they were really arrays, just minor
variations.  When I got to :hashsyntax, I realized I'd left behind "minor"
variations.  And then the restrictions on operators sunk in... these are getting
away from arrays.  And at the moment, it doesn't appear I've solved any of the
claimed problems, either.

> I can see the use of:
> my $x: integer;
> especially since it is of local scope.  More-over, that attribute might only
> be a suggestion to perl, and on first violation ( $x = "hi" ), it could
> throw away that attribute, so that we don't get into problems with returned
> values (and references).
>
> The most common issue that I can imagine would be returned values:
> sub foo { .... return wantarray ? @array_with_attributes :
> \@array_with_attributes }
> my @array = foo();  # no attributes are applied, we're at the mercy of
> default settings.
> my $ra_array = foo();
> $ra_array->[ num or sym ].  # depending on specific attributes, this might
> act differently than expected
>
> We can't do any sort of compile-time checking on references.  (this is why
> type-checking required 'my CLASS $foo').
>
> Additionally, to work properly, you'd have to store run-time attributes and
> meta-data within the target array.  This can be completely non-obvious to
> the user of a module.  I just see that you've returned a scalar ref or an
> array.  But am I supposed to somehow figure out what field positions you've
> used.
> Pseudo hashes solved this problem by passing the structure along with the
> array, and by requring the use of references.  Your method could return the
> contents of the array / hash, and thus loose information about it.

More thought required than I'm capable of tonight.  (or maybe ever.)

> In short, you've come up with a generic solution to do what pseudohashes
> tried, but I think you're ignoring some of the fundamental problems that had
> to be addressed by pseudo-hashes.  I don't think this is robust enough of an
> idea.

I think I agree with that synopsis.  I'm no pseudo-hash expert... the idea was
formed from other inspirations, and Mr. Schwern pointed out the pseudo-hash
relationship.  I'll be thinking about this some more, and experimenting some,
now that the RFC exists, but if there is no good solution to the problems you've
outlined, or the ones your comments led me to, I'm not sure there'd be much
value left to the RFC.  If solutions do seem to appear, you'll see a new version
of the RFC.

> -Michael
> p.s. Well it IS, after all, a REQUEST for comments. :)

p.s. Absolutely.  And extremely useful comments you supplied.  Thanks.

--
Glenn
=====
Even if you're on the right track,
you'll get run over if you just sit there.
                       -- Will Rogers



_____NetZero Free Internet Access and Email______
   http://www.netzero.net/download/index.html
Re: RFC 268 (v1) Keyed arrays

Reply via email to