Re: [gforth] ORDER bug

2011-11-17 Thread Anton Ertl
On Mon, Nov 14, 2011 at 01:18:56AM +0100, Bernd Paysan wrote:
 Am Sonntag, 13. November 2011, 18:41:54 schrieb Josh Grams:
  Hm, looking at head?, it really looks wrong.  Try replacing it with this:
  : head? ( addr -- f )
  
  \G heuristic check whether addr is a name token; may deliver false
  \G positives; addr must be a valid address
  
  dup dup aligned 
  if
 
 drop false exit \ heads are aligned
 
  then
  namestring dup $1F  if
 
 2drop false exit \ realistically the name is short
 
  then
  + cfaligned @ here forthstart within ; \ and the cfa is outside
  
  That does look more straightforward.  But it doesn't eliminate the
  problem I was having.   I guess I never gave a good minimal test case.
  This should reproduce it for reliably:
  
  here 32 cells erase  wordlist .voc
  
  Hmm...a zero-length name should always be an invalid header, so I assume
  replacing `$1F ` with `$20 1 WITHIN` should be OK?
 
 Yes, that should do it.  Thanks for the reliable test.

I have now tested where the two versions produce different results
(which is either a false positive or a false negative for one of the
versions).  Program below.  On my system running this as follows
produces the following results:

[~/gforth:76417] gforth xxx.fs -e test bye |grep new accepts as head|wc -l
781
[~/gforth:76418] gforth xxx.fs -e test bye |grep old accepts as head|wc -l
4546

So both versions differ by a lot, and given that the old version does
not produce false negatives, the new version obviously produces at
least 781 false positives.  That's a lot, given that there are only
2987 names in the hash table.

Also, the new test recognizes 3634 names in the dictionary; the
difference from the hash table population is only 647.  The difference
from the 781 number above can be explained by 134 false negatives from
the new version, and/or by false negatives from the old version.

Anyway, these numbers are much too high for my taste.  Maybe we should
write a reliable test instead of a heuristic one, by adding some more data.

E.g., if we had a list of all wordlists, we could just check if the
addr matches any of the words in any wordlist.  If that's too slow for
some applications, we can add some caching.

Or we might check all the words in the hash table (but not all
wordlists use the hash table).

What do you think?

- anton

Here's the program:
\ test head? implementations

: new-head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address
dup dup aligned 
if
drop false exit \ heads are aligned
then
namestring dup $20 $1 within if
2drop false exit \ realistically the name is short
then
+ cfaligned @ here forthstart within ; \ and the cfa is outside

: old-head? ( addr -- f )
\G heuristic check whether addr is a name token; may deliver false
\G positives; addr must be a valid address; returns 1 for
\G particularly unsafe positives
\ we follow the link fields and check for plausibility; two
\ iterations should catch most false addresses: on the first
\ iteration, we may get an xt, on the second a code address (or
\ some code), which is typically not in the dictionary.
\ we added a third iteration for working with code and ;code words.
3 0 do
dup dup aligned  if \ protect @ against unaligned accesses
drop false unloop exit
then
dup @ dup
if ( addr addr1 )
dup rot forthstart within
if \ addr1 is outside forthstart..addr, not a head
drop false unloop exit
then ( addr1 )
else \ 0 in the link field, no further checks
2drop 1 unloop exit \ this is very unsure, so return 1
then
loop
\ in dubio pro:
drop true ;

: my-.name ( nt -- )
namestring dup . len= .
16 min dump ;

: test
here forthstart do
i new-head? i old-head? over  if
cr if
. new
else
. old
then
.  accepts as head:  i hex. i my-.name
else
drop
then
loop ;

: testnew
0 here forthstart do
i new-head? -
loop ;



Re: [gforth] ORDER bug

2011-11-13 Thread Josh Grams
OK, I get it now.  There's junk in unALLOTted memory up to a certain
point; after that it's all zeroes.  So after that point, then if you
have a wordlist at the end of the dictionary, `head?` thinks it's is
followed by a header even though it isn't, and then that causes trouble.
So compiling anything or putting data there or whatever makes it go
away.  That's not so bad.  Disconcerting though.  Maybe there should be
a comment on `.voc` -- it took me a while to think to look at `head?`...

--Josh



Re: [gforth] ORDER bug

2011-11-12 Thread Josh Grams
Ah, ok, with some more digging around, I'm guessing it's a false
positive in `head?`.  Though it happens with VOCABULARY also (that's how
I ran into the problem).  Is it a bit dodgy in that case also?

--Josh