On Mon, Nov 14, 2011 at 01:18:56AM +0100, Bernd Paysan wrote:
Am Sonntag, 13. November 2011, 18:41:54 schrieb Josh Grams:
Hm, looking at head?, it really looks wrong. Try replacing it with this:
: head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address
dup dup aligned
if
drop false exit \ heads are aligned
then
namestring dup $1F if
2drop false exit \ realistically the name is short
then
+ cfaligned @ here forthstart within ; \ and the cfa is outside
That does look more straightforward. But it doesn't eliminate the
problem I was having. I guess I never gave a good minimal test case.
This should reproduce it for reliably:
here 32 cells erase wordlist .voc
Hmm...a zero-length name should always be an invalid header, so I assume
replacing `$1F ` with `$20 1 WITHIN` should be OK?
Yes, that should do it. Thanks for the reliable test.
I have now tested where the two versions produce different results
(which is either a false positive or a false negative for one of the
versions). Program below. On my system running this as follows
produces the following results:
[~/gforth:76417] gforth xxx.fs -e test bye |grep new accepts as head|wc -l
781
[~/gforth:76418] gforth xxx.fs -e test bye |grep old accepts as head|wc -l
4546
So both versions differ by a lot, and given that the old version does
not produce false negatives, the new version obviously produces at
least 781 false positives. That's a lot, given that there are only
2987 names in the hash table.
Also, the new test recognizes 3634 names in the dictionary; the
difference from the hash table population is only 647. The difference
from the 781 number above can be explained by 134 false negatives from
the new version, and/or by false negatives from the old version.
Anyway, these numbers are much too high for my taste. Maybe we should
write a reliable test instead of a heuristic one, by adding some more data.
E.g., if we had a list of all wordlists, we could just check if the
addr matches any of the words in any wordlist. If that's too slow for
some applications, we can add some caching.
Or we might check all the words in the hash table (but not all
wordlists use the hash table).
What do you think?
- anton
Here's the program:
\ test head? implementations
: new-head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address
dup dup aligned
if
drop false exit \ heads are aligned
then
namestring dup $20 $1 within if
2drop false exit \ realistically the name is short
then
+ cfaligned @ here forthstart within ; \ and the cfa is outside
: old-head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address; returns 1 for
\G particularly unsafe positives
\ we follow the link fields and check for plausibility; two
\ iterations should catch most false addresses: on the first
\ iteration, we may get an xt, on the second a code address (or
\ some code), which is typically not in the dictionary.
\ we added a third iteration for working with code and ;code words.
3 0 do
dup dup aligned if \ protect @ against unaligned accesses
drop false unloop exit
then
dup @ dup
if ( addr addr1 )
dup rot forthstart within
if \ addr1 is outside forthstart..addr, not a head
drop false unloop exit
then ( addr1 )
else \ 0 in the link field, no further checks
2drop 1 unloop exit \ this is very unsure, so return 1
then
loop
\ in dubio pro:
drop true ;
: my-.name ( nt -- )
namestring dup . len= .
16 min dump ;
: test
here forthstart do
i new-head? i old-head? over if
cr if
. new
else
. old
then
. accepts as head: i hex. i my-.name
else
drop
then
loop ;
: testnew
0 here forthstart do
i new-head? -
loop ;