date:20150226

Re: the hash function

2015-02-26 Thread Alexander Burger

Hi Enrique,

 I found some strange behaviour in the hash function.
 
 When applied to numbers, it works ok, but when applied
 to strings, it leads to a huge number of collisions. 
 ...
 # uniq hashed values, using 5 different numbers: 5
 # uniq hashed values, using 5 different strings: 10271
 # ==

You are right. This is not optimal. Thanks for the hint!

The reason is the initSeed()/initSeedE_E function in
src/big.c/src64/big.l.

It uses numbers directly, and the names (which are techincally also
numbers) in case of symbols. But for symbols these numbers have less
entropy, as they are not as denseley packed bit patterns like pure
numbers (basically what Oskar Wieland points out in his reply).

initSeed() should be improved, by doing more than simply adding up the
32-bit or 64-bit digits, at least in case of symbols. Any proposals?

♪♫ Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

the hash function

2015-02-26 Thread Enrique Sánchez

Hello,

I found some strange behaviour in the hash function.

When applied to numbers, it works ok, but when applied
to strings, it leads to a huge number of collisions. 

An example:
# ==

(setq N 5
  Lnumbers (range 1 N)
  Lstrings (mapcar format (range 1 N)) )

(prinl uniq hashed values, using  N  different numbers: 
   (length (uniq (mapcar hash Lnumbers))) )

(prinl uniq hashed values, using  N  different strings: 
   (length (uniq (mapcar hash Lstrings))) )

(bye)

# ==
# PRINTED RESULTS:
# 
# uniq hashed values, using 5 different numbers: 5
# uniq hashed values, using 5 different strings: 10271
# ==

enrique.




-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Re: miniPicoLisp: miniCodeROM in Alcor6L

2015-02-26 Thread Raman Gopalan

Dear Alex,

 I see that the original has in these cases

any const __attribute__ ((__aligned__(2*WORD))) Rom[] = {

 Does this make any difference? After all, I would expect 'WORD' to be 4
 on a 32-bit machine.

It seems to be making a difference. When I simply print the size of `WORD'
(renamed  to `PICOLISP_WORD' to resolve a conflict with one of avr32's
device defines) I get 4. Clear.

However, when I don't explicitly use the variable attribute, I run into an
Unaligned memory fault. Is there another way around this?

 Perhaps a stack problem? Also, I'm not sure if allocating 's' in the
 middle of a code body is right. I always did that in an explicit code
 block { .. }, but perhaps current C versions can handle this.

 Perhaps you can debug what string exactly you get in 's'?

`s[]'  holds the data correctly. The code was indeed working as expected. I
felt silly. It turns out that I had two copies of `pio-pin-setdir' (typo).
I simply
had to change it to `pio-port-setdir'.

pio-pin-setdir {plisp_pio_pin_setdir} --
pio-pin-setpull {plisp_pio_pin_setpull}
pio-pin-setval {plisp_pio_pin_setval}
pio-pin-sethigh {plisp_pio_pin_sethigh}
pio-pin-setlow {plisp_pio_pin_setlow}
pio-pin-getval {plisp_pio_pin_getval}
pio-pin-setdir {plisp_pio_port_setdir} --

R

P.S. I will also try to run this on my stm32f103re and post the results
here.

On 25 February 2015 at 12:54, Alexander Burger a...@software-lab.de wrote:

 Hi Raman,

 thanks for the report!

  2) I assumed step-1 should be sufficient but I ran into an Unaligned
  memory fault
  when I executed the build with step-1 alone. I then introduced variable
  attributes for
  `Rom' and `Ram' (included in main.c).
 
  any const Rom[] __attribute__ ((aligned (8))) = {
 #include rom.d
  };
 
  any Ram[] __attribute__ ((aligned (8))) = {
 #include ram.d
  };

 I see that the original has in these cases

any const __attribute__ ((__aligned__(2*WORD))) Rom[] = {

 Does this make any difference? After all, I would expect 'WORD' to be 4
 on a 32-bit machine.



  any plisp_pio_pin_setdir(any ex) {
 any x, y;
 // Some code here
 
 x = cdr(x);
 NeedSym(ex, y = EVAL(car(x)));
 char s[bufSize(y)];
 bufString(y, s);
 ret = pio_value_parse(s);
 PIO_CHECK(ret);
 
 plisp_pio_gen_setdir(ex, NULL, ret, PIO_PIN_OP, dir);
 return Nil;
  }
 
  I would then invoke my function like this:
  (pio-pin-setdir *pio-output* 'PB_29)
 
  Strange, but it looks like a part of 'PB_29 is getting stripped somehow

Re: the hash function

2015-02-26 Thread Oskar Wieland

this behavior may be caused by the fact that your are using only 26
values for letters out of 256 possible numbers. by considering upper and
lower case it's 52 letters out of 256, which accounts for about 20%.

52÷256×5 = 10156.25 (10271)

regards
oskar


On 02/27/2015 03:25 AM, Enrique Sánchez wrote:
 Hello,
 
 I found some strange behaviour in the hash function.
 
 When applied to numbers, it works ok, but when applied
 to strings, it leads to a huge number of collisions. 
 
 An example:
 # ==
 
 (setq N 5
   Lnumbers (range 1 N)
   Lstrings (mapcar format (range 1 N)) )
 
 (prinl uniq hashed values, using  N  different numbers: 
(length (uniq (mapcar hash Lnumbers))) )
 
 (prinl uniq hashed values, using  N  different strings: 
(length (uniq (mapcar hash Lstrings))) )
 
 (bye)
 
 # ==
 # PRINTED RESULTS:
 # 
 # uniq hashed values, using 5 different numbers: 5
 # uniq hashed values, using 5 different strings: 10271
 # ==
 
 enrique.
 
 
 
 

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Re: the hash function

the hash function

Re: miniPicoLisp: miniCodeROM in Alcor6L

Re: the hash function

4 matches

Site Navigation

Mail list logo

Footer information