Re: RFC 163 (v2) Objects: Autoaccessors for object data structures

Michael G Schwern Mon, 18 Sep 2000 15:11:27 -0700
On Mon, Sep 18, 2000 at 01:26:45PM -0700, Glenn Linderman wrote:
> Michael G Schwern wrote:
> > Similar mistaken logic leads to "globals are faster than lexicals".
> 
> Maybe so, but I'd think lexicals would be faster, because more of
> the lookup is done at compile time rather than runtime... so I'm not
> sure what is similar about the mistaken logic...

A better one would be "dereferences are slower than using the variable
directly".  ie.  its a common misconception that $href->{key} should
be slower than $hash{key} because you have to dereference $href.  This
mostly comes from C where dereferencing a pointer is slower than using
the variable directly.

Turns out references edge out globals.  Why?  A global variable has to
look up its symbol at run-time, get a *reference* out of it and
dereference it.  Most of the time if you're using a reference, its
lexical.  All a lexical reference has to do is dereference itself,
thus avoiding the symbol table lookup.

I've see this done:

        sub foo {
            my($href) = shift;
            local(*hash) = $href;

            ....
        }

as an "optimization".  Turns out that slows things down even further!
Declaring the local version of %hash is much, much more work than
everything else altogether.

I guess I'm trying to say something about micro-optmizations being
more trouble than they're worth and usually hurt more than they help.


> So let's posit you've cured the accessor overhead problem.  Now
> we're left with set_const being 40% slower for hash, and set_var
> 166% slower for hash.  Still want to ignore it?  Why?

Well, the fixed accessors would be using constant key lookups, so we
only have to worry about 40%, and this is 40% of a tiny, tiny fraction
of the actual overhead of a typical class.  So going nuts about making
it faster probably won't gain us much overall.  Certainly not worth
the effort of making arrays as easy to use as objects.

Take a class you've written that uses thin accessors and run it
through a profiler.  Look at the time spent in the accessors and
reduce it by 80% (the expected efficiency increase for built-in
accessors) and recalculate its overall effect on your performance.  If
its more than 5% I would be surprised.

Bascially, if accessor methods are currently eating less than 25% of
your total overhead, they will eat 5% if they are made built-in.
After that, diminishing returns kicks in.  A further 40% reduction
results in a 2% overall increase.  Who cares?  Spend the time elsewhere.


(This is the classic argument against early/micro optimizations)


> > > > I know, lets call it a pseudo-hash!
> > > >
> > > > Been there, done that, worn the scars proudly.
> > >
> > > Is _that_ what a pseudo-hash is?  Then it sounds like a good idea.

Yeah, it *sounds* like a good idea, but there's lots and lots of
little problems.


> >         Doesn't play well with multiple inheritance
> >
> I can sure believe this.  There'd be indexes from multiple base classes.  I
> don't know how Perl does multiple inheritance anyway, so I can't comment
> effectively on whether this is or would be a problem.  If Perl does multiple
> inheritance, I haven't stumbled across the documentation for it, but neither
> have I looked.  I don't use multiple inheritance.

The phrase "multiple inheritance" only comes up in perlboot, perltoot and
perltootc.  perlobj only implies that MI works because otherwise it
would be $ISA, not @ISA.

I use MI alot and really couldn't see a language without it.


> >         Muddles the behavior of typed variables
> >
> Not sure what this means.

Currently, the only thing really using the C<my Dog $spot> syntax is
psuedo-hashes.

    my Dog $ph;
    $ph->{cat} = 'Mrs. Chippy';  # $ph->[$Dog::FIELDS{cat}]

and there have been several RFCs about clarifying what typed variables
mean (usually in reference to objects).  Pseudo-hashes get in the way
of alot of those proposals.


> >         Requires significant extra documentation and complication of
> >                 hash operations.
> >
> I've used perl for some years, and never noticed pseudo-hashes from the user
> perspective.  Is this an internals only issue?  Or what else have I missed?

Its both an internals and a doc issue.  In the guts, there's a bunch
of special cases for them in the guts (though not nearly so
troublesome as, say, threading) although most of the functionality is
restriected to av.c.  The documentation is the bigger issue,
pseudo-hashes involve alot of caveats when explaining hashes, though
much of them have gone away in 5.6.0.

Also, the whole fields and base modules are troublesome.  If you wish
to write a subclass but use a pseudohash for your object instead of a
hash, you really can't unless the class author was careful enough to
declare all their fields (a rare occurance).  Also, consider the case
of a pseudo-hash friendly class, but with a subclass that uses @ISA
directly instead of base.pm and hashes instead of pseudo-hashes.  A
subclass of that subclass will no longer see the fields and thus the
pseudo-hashes are wrecked.


> My proposal is different, because it would require additional
> complication of array operations.  Hashes wouldn't be affected at
> all.  

You're just shifting the additional complexities from hashes to arrays.

> Arrays would be augmented with an internal hash (probably) to
> do the key to index translation at compile time, the run-time code
> wouldn't notice that.

Consider the following:

    package Parent;

    # Allows $o->[bar] == $o->[1]
    use afields qw(foo bar yar car);

    sub new { bless [], $_[0] }

    sub foo {
        my $shift = shift;
        $self->[foo] = shift;
    }

Simple enough.  Now which what happens when MI steps in:

    package Parent2;

    # bar == 0, up == 1, foo = 2
    use afields qw(bar up foo);

    sub myfoo {
        my $shift = shift;
        $self->[foo] = shift;
    }
    
    package Kid;

    use base qw(Parent Parent2);

    $k = Kid->new;

    $k->foo(42);   
    $k->myfoo(23);
    print $k->[foo], $k->[0], $k->[2];

Uh oh.  What happens now?!  Who's 'foo' does Kid inherit?  What
surprising things are in $k->[0] and $k->[2]?


> >         Inconsistencies between typed and untyped access.
> >
> I don't know what this means, either.

    my Dog $ph = [\%Cat::FIELDS];
    $ph->{name} = 'Foofer';  # $ph->[$Dog::FIELDS{name}]

    foo($ph);

    sub foo {
        my $ph = shift;
        print $ph->{name};  # $ph->[$ph->[0]{name}]
    }

Forget to type your lexicals and you might get something really really
weird.

> >         Pseudo-hashes, unless used very carefully, often turn out slower
> >                 than hashes.
> >
> Maybe so.  I'm not sure why, or why not, or what all the restrictions on
> pseudo-hashes are.

Untyped pseudohashes have to look at $ph->[0] to do their key-to-index
translation.  So in effect you have to do an array lookup, a hash
lookup and then another array lookup.  Typed pseudohashes are compiled
to their array representation and only involve an array lookup.

In the end, it means untyped pseudohashes are 15% slower than hashes.
And its not always possible to type.


> > Pseudo-hashes were added to solve three problems: restrict keyspace,
> >
> Not part of my proposal.

Obviously part of your proposal.  You'll have a strictly defined set
of keys, unless you want new keys to magically append to the array?


> > reduce memory usage
> >
> Not part of my proposal.  May be a side effect, I doubt it, though.

If you do your proposal on a per-class basis, you're going to win some
memory (but nothing to write home about).  If you do it on a per
object basis, you're going to lose alot, since each AV would have an
associated HV.


> After looking at these points, I'm missing how you jumped to the
> conclusion that I'm proposing pseudo-hashes.... they seem quite
> different than my proposal in many details.

You're proposing that string-based keys be mapped directly onto a
numerically indexed array.  Thats pseudohashes in a nutshell.  Replace
"mapped" with "pseudo-randomly mapped" and you've got hashes in a
nutshell.

The only real difference being that you are using [] instead of {} and
the $a->[string] syntax relieves some of the ambiguities of pseudohash
vs hash access.


-- 

Michael G Schwern      http://www.pobox.com/~schwern/      [EMAIL PROTECTED]
Just Another Stupid Consultant                      Perl6 Kwalitee Ashuranse
But why?  It's such a well designed cesspool of C++ code.  Why wouldn't
you want to hack mozilla?
                -- Ziggy
Re: RFC 163 (v2) Objects: Autoaccessors for object data structures

Reply via email to