Dan,
Thanks for being a good sport. I'm not in a hurry here - don't feel like you
need to be.
> >> I propose that keyed access do exactly eight things:
> >>
> >> * fetch a PMC using a key
> >> * fetch a integer using a key
> >> * fetch a number using a key
> >> * fetch a string using a key
> >> * store PMC
> >> * store int
> >> * store num
> >> * store string
> >>
> >> To add to a PMC, the PMC would be fetched, then a seperate instruction
> >> would add to it. This returns keys to their roots of merely optimizing
> >> access to deeply stored items.
> >
> >You may well be right. I am certainly concerned about the amount of
> >cut and paste duplication involved at the moment.
>
> [Dan:]
>
> Sounds like a good case for adding some smarts to the pmc and opcode
> preprocessor.
>
This would only automate the generation of large amounts of code, not get rid
of the large amount of code being generated. Once again, my complaint here is that
the L2 cache would buckle under the weight of a dozen PMCs each defining a few dozen
recursive accessors. The performance gain of making the code smaller is worth
taking two iterations through the main interpreter.
Stepping back a bit, my suggestion would be a more general: factor out the recursive
fetch from the operation. Keeping that as a single PVM op, the PVM could use the
vtable API to fetch the burried PMC/int/num/string, then fall through to the logic
that performs operations on immediately available values. Having two seperate VM
ops isn't strictly required to factor this out. Restating, anytime you massive
cut and paste code, it is indication of trouble ;)
> > 1) KEY *'s and their atoms are allocated from memory. The memory allocation
> > hurts more than it saves, cycle wise.
> > The number of CPU cycles used doing a virtual machine instruction *pales*
> > in comparison to what is needed to allocate memory.
>
> Wrong. Keys are fixed sized objects, and perfect candidates for the
> object allocator, which is screamingly fast.
> variable *contents* requires changing any key structure. This:
>
> for my $i (1..1000) {
> for my $j (1..1000) {
> for my $k (1..1000) {
> @foo[$i;$j;$k]++'
> }
> }
> }
>
> requires the construction of exactly *one* key structure, and
> requires that it be updated exactly *once*, through the entire loop.
....
> PMCs don't move. Keys can cache pointers to them. This isn't a problem.
The definition I have for KEY * here is a linked list. However, if you speak
truth saying nested access only generating then reusing one key, thats awesome,
and the allocation overhead is a total win.
I'm sorry I missed that gem looking at the source. The rough edges do
obscure the gems. If I can't find this gem at all, I'll be looking for pointers
on briniging it to light.
> So what? Recursion's not bad. Besides, aggregates may well be
> multidimensional, in which case
Right. I did address. That was part of my "this *is* a win when" diatribe.
When indexing objects that aren't multidim, there is a performance enhancement
that isn't insignficiant and requires no change to the design. This is your
cue to jump with joy ;)
> >>Speed. The reason for them is speed. Generality and elegance are
> >>nice, but the point is to go faster, which is why the special cases
> >>are there.
>
> >Given your objectives of speed, generality and elegance,
>
> I should point out here that elegance appears third in your list
> here. (It's fourth or fifth on mine)
Ooops.
> >3. Let PMCs create KEY * lists.
>
> That's what the interpreter is for. While there may not be sufficient
> information available to the interpreter and compiler to generate
> efficient key lists (which is possible, definitely) I can guarantee
> you that the interpreter has more information than any individual PMC
> vtable function will have.
>
> Keys are the interpreter's way of indicating the element of an
> aggregate. Construction of the key belongs to the interpreter.
> Interpretation of the key belongs to the PMC vtable functions. Leave
> them on the appropriate sides of the wall, please.
>
I certainly have a better feel for how this is supposed to work. I know
being pested isn't fun. I'll submit diffs on the PDDs so that this isn't
lost to the archives ;)
If I'm a little antagonistic, its because I'm used to being in *your* seat,
and I can't stand not having the tools to get things done. That being said,
some of the people you've got volunteering are smart as a whip. Make the
most of them. Bring them up to speed and turn them loose. You'll be very
glad you did. You don't need to turn out working code and quality documentation
to pass the torch - you can harness these people merely by passing them
well thought out ideas. Most people won't resort to debate if there is a
workable solution on the table - only when we're completely lost and
confused why a proposal is being rejected out-of-hand.
A truely great language is one in which people can do things the designers
never thought of. A truely great development team is one where the developers
come up with ideas the lead never thought of =)
> > * function calls consume resources
> Generally incorrect. Function calls are, while not free, damned cheap
> on most semi-modern chips.
Your inner loop is a few lines of code. If every inner loop execution triggers
a cascade of function calls, this is lost. It may be small, but certain cases
do warrent changing extremely frequently used recursive structures to
iterative structures. I'm not saying this happends - I'm just saying that there
is a certain point when this value does become significant.
> > * assuming a 2 meg cache on a 2 meg cache makes for a program that works
> > *great* on your machine, then proceds to suck pond scum on mine =)
> > assuming a 1 meg cache, the program will run marginally slower on yours,
> > but an order of magnitude faster on mine
>
> That's why I've got a 300MHz original Celeron system here.
Given that, I can sleep at night knowing Parrot will *never* turn into JVM.
>
> > * caches, virtual memory, and all of their ilk work best when you pretend
> > they dont exist. think of them as little faerie helpers - dont
> >demand work,
> > and when your shoes are fixed in the morning, leave them a treat ;)
>
> You forgot a few.
Gnomes, trolls, hobgloblin, satyr, gremlin, svirfneblin, fair folk, little
people, fae, nymph, golem, dryad, imps, jinn, .. am I any closer?
>
> * Think about the common case and plan for it
> * Make sure your performance assumptions aren't out of date
> * Reevaluate after bottlenecks are removed
Agreed.
> >Case in point:
> >
> >Perl 5 runs *awesome* on a 486/25.
>
> You're beneath our floor. Performance issues on that system aren't
> something I'm worried about, any more than I'm worried about not
> compiling on a K&R-only C compiler. They're only interesting and
> worth addressing if it's an easy way to address performance issues on
> hardware we do care about. (Palms, for example)
Thats not the point. If it takes days to open a window and print "Hello World"
in it, you're abusing the hardware, and you've gone down the road of the JVM.
Its kind of the creeping thing. Waste a little time here, waste a little, suddenly
its only marginal on fast hardware.
I don't think this is going to be an issue, though. Still, prior art have a lot
to say about this.
> >In summary, I beg Dan to reconsider =)
>
> This is always in order, but in this case I think you've not given
> sufficient cause. Part of that is because the design's not been
> sufficiently set down, which is my fault--I do realize that makes
> things difficult. (I'll point out that this is why I did ask people
> to hold off a bit...)
I agree with Dan now that I understand better. My complaints have been addressed,
with the one exception of refactoring code bloat. I feel this is a small change
in implementation, and shouldn't impact design. I hope Dan will (pending time)
consider it, and I'll be happy to hash it out with him on IRC to make sure
both parties understand exactly what is being said and that I don't continue
to miss things ;)
-scott