Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-19 Thread Serhat Sevki Dincer
Hi,

Chris kindly searched collisions on his 6 TiB ram server and we could
not find any for more than 5 x 2^37 inputs (for both + and ^ versions)
! Final version of the hash is available at
https://github.com/jfcg/sixb

Let me know if you find one ;)

On Tue, Feb 19, 2019 at 10:44 AM Chris Burkert  wrote:
> Good Morning Serhat, here it is:
>
> # time ./prime2x
> prime2x xor: 0 collisions
>
> real355m51.484s
> user12490m36.231s
> sys 4079m42.192s
>
 Am Mo., 18. Feb. 2019 um 08:47 Uhr schrieb Serhat Sevki Dincer 
 :
> Great, can you also try xor version?
> I wonder if it will have any collisions.
>
> I made a tiny improvement to sorting here github.com/jfcg/sorty
>
> Thanks..
>
> 18 Şub 2019 Pzt 10:42 tarihinde Chris Burkert  
> şunu yazdı:
>> Hello Serhat,
>> this time it ran fine:
>>
>> # time ./prime2m
>> prime2m add: 0 collisions
>>
>> real350m28.446s
>> user12765m29.456s
>> sys 5997m49.762s
>>
>> cheers
>> Chris
>>
>> Am Mo., 18. Feb. 2019 um 08:11 Uhr schrieb Serhat Sevki Dincer 
>> :
>>> Thanks, how is the new code doing in terms of cpu / ram usage?
>>> Is it still running?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
package main

// Collision test for a simple hash with short utf8 text inputs
// Author: Serhat Şevki Dinçer, jfcgaussATgmail

import (
	"fmt"
	"github.com/jfcg/sorty"
)

func Txt2int(s string) uint64 {
	x := uint64(len(s))
	for i := len(s) - 1; i >= 0; i-- {
		x *= 11400714819323198549
		x ^= uint64(s[i])
	}
	return x
}

const N = 687194767360 // 274877906944

var hl = make([]uint64, N)  // hash list: 5 TiB ram
var bf = [6]byte{30, 255, 127, 1, 1, 1} // input buffer
var sgch = make(chan bool, 1)

func hashRange(hl []uint64, bf [6]byte, signal bool) {
	for i := len(hl) - 1; i >= 0; i-- {
		hl[i] = Txt2int(string(bf[:]))

		// next utf8-ish input
		for k := 5; ; k-- {
			bf[k] += 4

			if bf[k] > 3 {
break
			}
			bf[k]++ // continue with carry
		}
	}

	if signal {
		sgch <- true
	}
}

func main() {
	const Q = N / 4 // 4 goroutines will fill hash list
	for k := 0; k < 3; k++ {
		go hashRange(hl[k*Q:(k+1)*Q], bf, true)
		bf[5]++
	}
	hashRange(hl[3*Q:], bf, false) // main is the 4th goroutine

	for k := 0; k < 3; k++ {
		<-sgch // wait friends
	}

	sorty.ArU8 = hl
	sorty.SortU8()
	k := 0
	for i := N - 1; i > 0; i-- {
		if hl[i] == hl[i-1] {
			k++
		}
	}
	fmt.Println("prime2x xor:", k, "collisions") // xor
}


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-13 Thread Serhat Sevki Dincer
14 Şub 2019 Per 01:58 tarihinde Michael Jones 
şunu yazdı:

> Serhat,
>
> Some more ideas for you to consider: the expected number of collisions for
> an ideal random hash, the option of "folding in" the high bits of the hash
> rather than truncating, and finer control of operation.
>
> https://play.golang.org/p/92ERC4PJKAL
>

Hi Michael,
Here with your variant:

x = uint64(1<<64-59) ^ uint64(len(s))

for i := len(s) - 1; i >= 0; i-- {
x ^= uint64(s[i])
x *= 11400714819323198549
}

// fold high bits

you have a direct collision with input length's lowest byte (which is
practically the length) and input's first byte. This is better for
initialization:

x = uint64(len(s))
x *= 11400714819323198549

at the cost of an extra multiplication.

Folding high bits does not change collision characteristic, right?
Also when the last operation is a multiplication, you don't seem to need
it, if at all it is useful.

>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-13 Thread Michael Jones
Serhat,

Some more ideas for you to consider: the expected number of collisions for
an ideal random hash, the option of "folding in" the high bits of the hash
rather than truncating, and finer control of operation.

https://play.golang.org/p/92ERC4PJKAL

On Wed, Feb 13, 2019 at 12:20 PM Serhat Şevki Dinçer 
wrote:

> On Tuesday, February 12, 2019 at 9:51:17 PM UTC+3, Michael Jones wrote:
>>
>> Serhat, some ideas for you...
>> https://play.golang.org/p/7QPy5wa-9eO
>>
>
> Interestingly I found out input iteration order does matter :) 15,33 shift
> version yields an amazing number of collisions (7_910_886_368 collisions
> over 7_918_845_952 inputs) when fed with a spesific 6-byte sequence
> (attached).
> rotate-27 version interestingly gave no collisions for this case. prime
> version also had no collisions. prime version with ^= and -= instead of +=
> also gave no collision.
>
> I want to identify a collision on the prime version (any variant ^= -= +=)
> with small inputs, say sum of lengths <= 12. Does anyone have any idea how
> to identify such inputs?
> So far prime version seems to be the most promising (in terms of high
> minimum sum of collding inputs)..
>
> Thanks..
>
> Note: Thanks Damian, it looks like an advanced test suite for cryto
> hashes, though text input tests seems limited.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-13 Thread Serhat Şevki Dinçer
On Tuesday, February 12, 2019 at 9:51:17 PM UTC+3, Michael Jones wrote:
>
> Serhat, some ideas for you...
> https://play.golang.org/p/7QPy5wa-9eO
>

Interestingly I found out input iteration order does matter :) 15,33 shift 
version yields an amazing number of collisions (7_910_886_368 collisions 
over 7_918_845_952 inputs) when fed with a spesific 6-byte sequence 
(attached).
rotate-27 version interestingly gave no collisions for this case. prime 
version also had no collisions. prime version with ^= and -= instead of += 
also gave no collision.

I want to identify a collision on the prime version (any variant ^= -= +=) 
with small inputs, say sum of lengths <= 12. Does anyone have any idea how 
to identify such inputs?
So far prime version seems to be the most promising (in terms of high 
minimum sum of collding inputs)..

Thanks..

Note: Thanks Damian, it looks like an advanced test suite for cryto hashes, 
though text input tests seems limited.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


txt2int3.go
Description: Binary data


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-12 Thread Michael Jones
Serhat, some ideas for you...
https://play.golang.org/p/7QPy5wa-9eO

On Tue, Feb 12, 2019 at 9:23 AM Damian Gryski  wrote:

> For more information on hash function goodness tests, check out
> https://github.com/demerphq/smhasher
>
> Damian
>
> On Tuesday, February 12, 2019 at 5:23:39 AM UTC-8, Serhat Şevki Dinçer
> wrote:
>>
>> On Saturday, February 9, 2019 at 5:23:14 PM UTC+3, Serhat Şevki Dinçer
>> wrote:
>>>
>>> On Fri, Feb 8, 2019 at 7:42 PM Michael Jones wrote:
>>> > clustering:
>>> >
>>> https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html
>>> >
>>> > careful hash functions often treat short inputs specially.
>>> >
>>> > iterated shift-xor alone is weak in expanding the "changed bits(s)"
>>> signal, at least by comparison to a) large prime multiply, b) good s-boxes,
>>> c) introduction of keyed material.
>>> Hm, thanks. I would like to try a particular version of this prime
>>> multiplication idea wih utf8 strings.
>>> I wrote for loops to find collisions (attached). It takes 3 seconds
>>> and finds no collision for strings with length < 4.
>>> The next step (including length=4, commented in the code) will take
>>> 13+ min and 32+ gb ram, so I appreciate if anyone with sufficient RAM
>>> could try that and report the result ;P
>>> Thanks..
>>>
>>
>> I got access to a server with 64 gb ram. On it, using a Go map did not
>> work, so I used a list and sorted it for identifying collisions.
>> It turned out both prime and shift versions (attached) of the simple hash
>> turned out to be really good for small inputs. No collisions for
>> 7_918_845_952 inputs.
>> This required 59 GiB of ram. For a 64-bit cryptographic hash output, the
>> probability of a collision for that many inputs is about %82.
>>
>> What I am curious about next is
>> - How further can this test be taken ? When are the first collisions for
>> these simple hashes with the given code ?
>> - What are their "minimum sum of colliding inputs lengths" ?
>> - Is there a (smooth) trade-off between being cryptographic /
>> good-bit-mixing *and* being nice on small inputs / high minimum sum of
>> colliding inputs lengths ? Assume that we dont treat small inputs
>> differently, and there is one general algorithm for all inputs..
>> - Can a cryptographic hash function have high minimum sum of colliding
>> inputs lengths, or be nice on small inputs ?
>>
>> Thanks..
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-12 Thread Damian Gryski
For more information on hash function goodness tests, check 
out https://github.com/demerphq/smhasher

Damian

On Tuesday, February 12, 2019 at 5:23:39 AM UTC-8, Serhat Şevki Dinçer 
wrote:
>
> On Saturday, February 9, 2019 at 5:23:14 PM UTC+3, Serhat Şevki Dinçer 
> wrote:
>>
>> On Fri, Feb 8, 2019 at 7:42 PM Michael Jones wrote: 
>> > clustering: 
>> > https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html 
>> > 
>> > careful hash functions often treat short inputs specially. 
>> > 
>> > iterated shift-xor alone is weak in expanding the "changed bits(s)" 
>> signal, at least by comparison to a) large prime multiply, b) good s-boxes, 
>> c) introduction of keyed material. 
>> Hm, thanks. I would like to try a particular version of this prime 
>> multiplication idea wih utf8 strings. 
>> I wrote for loops to find collisions (attached). It takes 3 seconds 
>> and finds no collision for strings with length < 4. 
>> The next step (including length=4, commented in the code) will take 
>> 13+ min and 32+ gb ram, so I appreciate if anyone with sufficient RAM 
>> could try that and report the result ;P 
>> Thanks.. 
>>
>
> I got access to a server with 64 gb ram. On it, using a Go map did not 
> work, so I used a list and sorted it for identifying collisions.
> It turned out both prime and shift versions (attached) of the simple hash 
> turned out to be really good for small inputs. No collisions for 
> 7_918_845_952 inputs.
> This required 59 GiB of ram. For a 64-bit cryptographic hash output, the 
> probability of a collision for that many inputs is about %82.
>
> What I am curious about next is
> - How further can this test be taken ? When are the first collisions for 
> these simple hashes with the given code ?
> - What are their "minimum sum of colliding inputs lengths" ?
> - Is there a (smooth) trade-off between being cryptographic / 
> good-bit-mixing *and* being nice on small inputs / high minimum sum of 
> colliding inputs lengths ? Assume that we dont treat small inputs 
> differently, and there is one general algorithm for all inputs..
> - Can a cryptographic hash function have high minimum sum of colliding 
> inputs lengths, or be nice on small inputs ?
>
> Thanks..
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-12 Thread Serhat Şevki Dinçer
On Saturday, February 9, 2019 at 5:23:14 PM UTC+3, Serhat Şevki Dinçer 
wrote:
>
> On Fri, Feb 8, 2019 at 7:42 PM Michael Jones wrote: 
> > clustering: 
> > https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html 
> > 
> > careful hash functions often treat short inputs specially. 
> > 
> > iterated shift-xor alone is weak in expanding the "changed bits(s)" 
> signal, at least by comparison to a) large prime multiply, b) good s-boxes, 
> c) introduction of keyed material. 
> Hm, thanks. I would like to try a particular version of this prime 
> multiplication idea wih utf8 strings. 
> I wrote for loops to find collisions (attached). It takes 3 seconds 
> and finds no collision for strings with length < 4. 
> The next step (including length=4, commented in the code) will take 
> 13+ min and 32+ gb ram, so I appreciate if anyone with sufficient RAM 
> could try that and report the result ;P 
> Thanks.. 
>

I got access to a server with 64 gb ram. On it, using a Go map did not 
work, so I used a list and sorted it for identifying collisions.
It turned out both prime and shift versions (attached) of the simple hash 
turned out to be really good for small inputs. No collisions for 
7_918_845_952 inputs.
This required 59 GiB of ram. For a 64-bit cryptographic hash output, the 
probability of a collision for that many inputs is about %82.

What I am curious about next is
- How further can this test be taken ? When are the first collisions for 
these simple hashes with the given code ?
- What are their "minimum sum of colliding inputs lengths" ?
- Is there a (smooth) trade-off between being cryptographic / 
good-bit-mixing *and* being nice on small inputs / high minimum sum of 
colliding inputs lengths ? Assume that we dont treat small inputs 
differently, and there is one general algorithm for all inputs..
- Can a cryptographic hash function have high minimum sum of colliding 
inputs lengths, or be nice on small inputs ?

Thanks..

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


txt2int2.go
Description: Binary data


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-09 Thread Serhat Sevki Dincer
On Fri, Feb 8, 2019 at 7:42 PM Michael Jones  wrote:
> clustering:
> https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html
>
> careful hash functions often treat short inputs specially.
>
> iterated shift-xor alone is weak in expanding the "changed bits(s)" signal, 
> at least by comparison to a) large prime multiply, b) good s-boxes, c) 
> introduction of keyed material.
Hm, thanks. I would like to try a particular version of this prime
multiplication idea wih utf8 strings.
I wrote for loops to find collisions (attached). It takes 3 seconds
and finds no collision for strings with length < 4.
The next step (including length=4, commented in the code) will take
13+ min and 32+ gb ram, so I appreciate if anyone with sufficient RAM
could try that and report the result ;P
Thanks..

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
package main

import "fmt"

func Txt2int(s string) uint64 {
	x := uint64(len(s))
	for i := len(s) - 1; i >= 0; i-- {
		x *= 11400714819323198549
		x += uint64(s[i])
	}
	return x
}

const N = 16646656 // 4244897281

var mp = make(map[uint64]bool, N)
var bf = []byte("abcd")

func main() {
	mp[0] = true

	for i := 255; i > 0; i-- {
		bf[0] = byte(i)
		mp[Txt2int(string(bf[:1]))] = true

		for k := 255; k > 0; k-- {
			bf[1] = byte(k)
			mp[Txt2int(string(bf[:2]))] = true

			for r := 255; r > 0; r-- {
bf[2] = byte(r)
mp[Txt2int(string(bf[:3]))] = true

/*	for s := 255; s > 0; s-- {
	bf[3] = byte(s)
	mp[Txt2int(string(bf[:4]))] = true
} */
			}
		}
	}
	fmt.Println(N - len(mp))
}


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-08 Thread Michael Jones
clustering:
https://www.cs.cornell.edu/courses/cs3110/2014fa/lectures/13/lec13.html

careful hash functions often treat short inputs specially.

iterated shift-xor alone is weak in expanding the "changed bits(s)" signal,
at least by comparison to a) large prime multiply, b) good s-boxes, c)
introduction of keyed material.

collisions will happen, and no general hash will attain the full period
possibility. yes a perfect hash is possible but not to my knowledge for
general strings.

strings tend to be very non-random (a kind of zipf's law for word
spelling/product names/text...) so there is a difficult demand for the hash
function. in spirit you want to encrypt and carefully fold down to
tablesize.

utf-8/16/32 strings are somewhat sparse...so you don't have as many bits of
input material as you may want.
(http://www.unicode.org/faq/utf_bom.html ... 21 bits is enough)

searching through parameter space works very well for many hash and similar
goals. important to know how to model the input and measure output quality.

On Fri, Feb 8, 2019 at 1:56 AM Serhat Sevki Dincer 
wrote:

> 8 Şub 2019 Cum 11:57 tarihinde Michael Jones 
> şunu yazdı:
>
>> ...says that in one particular test condition (8 character strings, 1M
>> random strings, all possible shift values)
>> and under one particular metric of virtue...
>>
>> x = x<<15 ^ x>>33
>>
>> ...gives the closest overall approximation to a random result. you might
>> try that in your application to see how well it works for you. that figure
>> of merit (44.608) is not particularly high but not terrible. A modification
>> of FNV1a can get to 22 (almost: 2.200355e+01) here.
>>
>
> Hm, so overlapping shifts help get better mixing. As a hash function gets
> better at mixing bits, it will almost surely have a collision (due to
> birthday paradox) with two inputs with lengths <= 4.
>
> Can we artificially guarantee a minimum sum of colliding (utf8 string)
> inputs that is > 8 ?
>
> I think such a hash needs to satisfy the following:
>
> 1) Number of valid utf8 strings with length <= 8 is less than (255^9 -
> 1)/254 which is less than 2^64
> So if the hash maps these strings to distinct uint64 values, then minimum
> sum of colliding input lengths is at least 9. Can it be > 9 ?
>
> 2) Be non-trivial. For example, this hash satisfies 1):
> if len(input) < 8, pad with zeros up to total 8 bytes and return
> otherwise return first 8 bytes of input
>
> this also satisfies 1):
> Treat input as integer in base 255, return input mod 2^64
>
> 3) have equidistribution for overall inputs
>
> Instead of statistical methods, we can maybe brute search among parametric
> simple hash functions for one that satisfies these 3 properties? Or
> explicitly design one?
>
>>

-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-08 Thread Serhat Sevki Dincer
8 Şub 2019 Cum 11:57 tarihinde Michael Jones  şunu
yazdı:

> ...says that in one particular test condition (8 character strings, 1M
> random strings, all possible shift values)
> and under one particular metric of virtue...
>
> x = x<<15 ^ x>>33
>
> ...gives the closest overall approximation to a random result. you might
> try that in your application to see how well it works for you. that figure
> of merit (44.608) is not particularly high but not terrible. A modification
> of FNV1a can get to 22 (almost: 2.200355e+01) here.
>

Hm, so overlapping shifts help get better mixing. As a hash function gets
better at mixing bits, it will almost surely have a collision (due to
birthday paradox) with two inputs with lengths <= 4.

Can we artificially guarantee a minimum sum of colliding (utf8 string)
inputs that is > 8 ?

I think such a hash needs to satisfy the following:

1) Number of valid utf8 strings with length <= 8 is less than (255^9 -
1)/254 which is less than 2^64
So if the hash maps these strings to distinct uint64 values, then minimum
sum of colliding input lengths is at least 9. Can it be > 9 ?

2) Be non-trivial. For example, this hash satisfies 1):
if len(input) < 8, pad with zeros up to total 8 bytes and return
otherwise return first 8 bytes of input

this also satisfies 1):
Treat input as integer in base 255, return input mod 2^64

3) have equidistribution for overall inputs

Instead of statistical methods, we can maybe brute search among parametric
simple hash functions for one that satisfies these 3 properties? Or
explicitly design one?

>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-08 Thread Michael Jones
one quick result:

celeste:spin10 mtj$ spin10 -a 0,63 -b 0,63 -bins 32 -rotate ab -set 0,0,0,0
-samples 10 -trials 10
[0] best 5.683527373827505613e+01  8  0  0  0
[1] best 5.508690460484671547e+01  8  1  0  0
[2] best 5.434519430630660253e+01  8  2  0  0
[4] best 5.422007935279938806e+01  8  4  0  0
[6] best 5.280780313359068856e+01  8  6  0  0
[00011] best 5.218225799693001932e+01  8 11  0  0
[00015] best 5.112439186124714041e+01  8 15  0  0
[00035] best 4.965170712324208324e+01  8 35  0  0
[00040] best 4.907791674503666002e+01  8 40  0  0
[00043] best 4.693620797201791817e+01  8 43  0  0
[00324] best 4.608481026758963850e+01 13  4  0  0
[00481] best 4.460801497488986911e+01 15 33  0  0
complete: 3648 trials, 75.095 seconds, 48.578 trials/sec


...says that in one particular test condition (8 character strings, 1M
random strings, all possible shift values)
and under one particular metric of virtue...

x = x<<15 ^ x>>33

...gives the closest overall approximation to a random result. you might
try that in your application to see how well it works for you. that figure
of merit (44.608) is not particularly high but not terrible. A modification
of FNV1a can get to 22 (almost: 2.200355e+01) here.


On Thu, Feb 7, 2019 at 11:03 AM Serhat Şevki Dinçer 
wrote:

> On Tuesday, February 5, 2019 at 12:29:32 AM UTC+3, Michael Jones wrote:
>
>> I recently did just this in an effort (successful!) to make a well-known
>> simple hash function be its best with minor single CPU cycle changes.
>>
>
> yes I am told 15 is not the best shift amount, how about this?
> x = x<<27 ^ x>>37
>
> the upper limit of length sum could be 9, because hashes of 9-byte valid
> utf8 strings are less than 255^9, on average ~247 per uin64 value, so one
> such uint64 value most likely will be zero which is the hash of empty
> string. so you get a length sum of 9. can you bring it down? or can we
> "design" a hash to guarantee 9?
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-07 Thread Serhat Şevki Dinçer
On Tuesday, February 5, 2019 at 12:29:32 AM UTC+3, Michael Jones wrote:

> I recently did just this in an effort (successful!) to make a well-known 
> simple hash function be its best with minor single CPU cycle changes. 
>

yes I am told 15 is not the best shift amount, how about this?
x = x<<27 ^ x>>37

the upper limit of length sum could be 9, because hashes of 9-byte valid 
utf8 strings are less than 255^9, on average ~247 per uin64 value, so one 
such uint64 value most likely will be zero which is the hash of empty 
string. so you get a length sum of 9. can you bring it down? or can we 
"design" a hash to guarantee 9?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [go-nuts] A Measure of Hash Function Collisions

2019-02-04 Thread Michael Jones
There are many measures. One realm of them focus on the mixing properties
of the design as would be a consideration in cypher systems. The other
“experimentalist” realm considers how the hash performs compared to an
ideal hash function.

The latter approach is suitable for a broader range of developers. A google
search for measuring hash quality will quickly lead to expectations about
collisions from a random variable and also to equations for statistical
measures.

I recently did just this in an effort (successful!) to make a well-known
simple hash function be its best with minor single CPU cycle changes.

On Mon, Feb 4, 2019 at 11:49 AM Serhat Şevki Dinçer 
wrote:

> Hi,
>
> What is the minimum sum of input lengths len(s1)+len(s2) such that:
>
> - s1, s2 are distinct utf8 strings
> - Txt2int(s1) = Txt2int(s2)
>
> for the below simple hash function ?
>
> func Txt2int(s string) uint64 {
> x := uint64(len(s))
> for i := len(s) - 1; i >= 0; i-- {
> x = x<<15 ^ x>>49
> x += uint64(s[i])
> }
> return x
> }
>
> In general, how good is "minimum sum of colliding input lengths" as a
> measure of collision-resistance for a (not necessarily cryptographic) hash
> function ?
>
> Thanks..
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-- 

*Michael T. jonesmichael.jo...@gmail.com *

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.