Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Bulat Ziganshin
Hello Richard,

Thursday, August 28, 2008, 5:28:46 AM, you wrote:

 trie: O(len)*O(width)
 hashed trie: O(len)
 hash: O(len)

 If width here refers to the branching factor of the trie,
 it's actually O(len.lg(width)), and the width that matters
 is not the *possible* number of choices but the *actual*
 number.

i thought about using list to hold all variations at the trie node. with
a (balanced) tree at each node we will have even more overheads


 The great problem with hash tables is devising good hash
 functions.  There is a vast literature about hash tables,
 but there is very little about how to design good hash
 functions for anything other than numbers and maybe strings.

1. tries also work only for strings and other lists
2. i don't want to go into discussing well-known pluses and minuses of
data structures. my point was just that we have great alternative to
trees/tries which should be implemented many years ago. i've used a
lots of assoclists in my program, sometimes this really degraded
performance (although it's yet another question - why tree/trie
structures doesn't provide simple List.lookup replacement function and
why i'm so lazy to still not learning their APIs)



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Don Stewart
bulat.ziganshin:
 Hello Richard,
 
 Thursday, August 28, 2008, 5:28:46 AM, you wrote:
 
  trie: O(len)*O(width)
  hashed trie: O(len)
  hash: O(len)
 
  If width here refers to the branching factor of the trie,
  it's actually O(len.lg(width)), and the width that matters
  is not the *possible* number of choices but the *actual*
  number.
 
 i thought about using list to hold all variations at the trie node. with
 a (balanced) tree at each node we will have even more overheads
 
 
  The great problem with hash tables is devising good hash
  functions.  There is a vast literature about hash tables,
  but there is very little about how to design good hash
  functions for anything other than numbers and maybe strings.
 
 1. tries also work only for strings and other lists
 2. i don't want to go into discussing well-known pluses and minuses of
 data structures. my point was just that we have great alternative to
 trees/tries which should be implemented many years ago. i've used a
 lots of assoclists in my program, sometimes this really degraded
 performance (although it's yet another question - why tree/trie
 structures doesn't provide simple List.lookup replacement function and
 why i'm so lazy to still not learning their APIs)
 

Seems like you can make a pure hashtable by unsafePerformIO on the
impure one, and exporting only 'lookup'..

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Bulat Ziganshin
Hello Don,

Thursday, August 28, 2008, 10:32:43 AM, you wrote:

 Seems like you can make a pure hashtable by unsafePerformIO on the
 impure one, and exporting only 'lookup'..

probably yes, but it will lose a bit of performance due to incremental
building of hashtable. actually, writing HT module from scratch is
very easy - all we need is a prime searching function (in order to
establish array size). everything else:

data HT a b = HT (a-Int) (Array Int [(a,b)])

create size hash list = HT hashfunc
   (accumArray (flip (:))
   []
   (0, arrsize-1)
   (map (\(a,b) - (hashfunc a,b)) list)
   )

  where arrsize =  head$ filter (size)$ iterate (\x-3*x+1) 1
hashfunc a  =  hash a `mod` arrsize


lookup a (HT hash arr) = List.lookup a (arr!hash a)




-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Jason Dusek
Jules Bean [EMAIL PROTECTED] wrote:
 Jason Dusek wrote:
  I would much rather have a pure Trie that is foldable. If we
  have a Trie, we get a space efficient sorted list, too.

 Well, Data.Sequence can be used as a space efficient sorted
 list which is Foldable - if you make the decision to insert
 elements into it in a sorted way, obviously.

 What advantages would a Trie have over Data.Sequence?

  A trie is guaranteed sorted -- so using a trie amounts to a
  type level guarantee for binary search and any other
  algorithm that relies on sortedness.

--
_jsn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Jules Bean

Jason Dusek wrote:
 Jules Bean [EMAIL PROTECTED] wrote:
 Jason Dusek wrote:
 I would much rather have a pure Trie that is foldable. If we
 have a Trie, we get a space efficient sorted list, too.
 Well, Data.Sequence can be used as a space efficient sorted
 list which is Foldable - if you make the decision to insert
 elements into it in a sorted way, obviously.

 What advantages would a Trie have over Data.Sequence?

   A trie is guaranteed sorted -- so using a trie amounts to a
   type level guarantee for binary search and any other
   algorithm that relies on sortedness.

...No more so than a simple wrapper over a Data.Sequence which puts the 
elements in the right place.


Insert for Data.Sequence is log(i) where i is the position of the 
insertion; clearly bounded by log(n). toList is O(n) and index is (at 
worst) log(i).


I think the corresponding operations with tries are log(n), so 
asymptotically tries are identical for 'uniformly distributed' keys 
although fingertrees are faster if there is a bias towards elements near 
the ends of the lists.


So, it's all in the constants, isn't it?

Jules
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Jan-Willem Maessen


On Aug 27, 2008, at 4:31 PM, Bulat Ziganshin wrote:


Hello Jan-Willem,

Wednesday, August 27, 2008, 4:06:11 PM, you wrote:


One obvious way to make non-modifiable hash tables useful is to eat
your own tail non-strictly--- aggregate a set of hash table entries,
construct a hash table from them, and plumb the resulting hash table
into the original computation by tying the knot.  This works really
well if you can construct the bucket lists lazily and if you specify
the table size up front.


i think it's impossible since you need to scan whole assoclist to
build list of entries for any value of hash function.


I think this is because I wasn't quite clear enough about the problem  
I was solving.  I think you'll agree that we can use an assocList non- 
strictly in the following sense:
  * We can do lookups that succeed so long as we can compute all keys  
up to and including the key that matches.
  * We never do non-strict lookups that fail, as that would require  
examining the entire assocList.


Now I can build a hashtable with the same property from an assocList,  
albeit very inefficiently, using code like this (untested):


lazyArray :: (Ix i) = (i,i) - [(i,v)] - Array i [v]
lazyArray bnds kvs =
array bnds [ (k, map snd . filter ((k==) . fst) $ kvs) | k -  
range bnds ]


makeHash :: (Eq k, Hashable k) = [(k,v)] - Array Int [(k,v)]
makeHash assocList =
lazyArray (0,n-1) labeledAssocList
  where labeledAssocList = [ (hashToSize n k, (k,v)) | (k,v) -  
assocList ]


We label each assocList element with its corresponding hash bucket  
(labeledAssocList); each bucket then contains exactly the elements of  
assocList that map to that bucket, in exactly the order in which they  
occurred in assocList.


The LazyArray library in hbc essentially did exactly what the  
lazyArray function I've shown above does, only the input list is  
traversed once rather than having a separate traversal for each  
bucket.  This can actually be implemented using the ST monad augmented  
by unsafeFreezeSTArray and unsafeInterleaveST; indeed the State in  
Haskell paper by Peyton Jones and Launchbury gives the implementation  
of a very similar function.


I have code for LazyArray based on the above paper that works with  
GHC, but I haven't needed it in a while.


-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Jason Dusek
Jules Bean [EMAIL PROTECTED] wrote:
 Jason Dusek wrote:
  Jules Bean [EMAIL PROTECTED] wrote:
   Jason Dusek wrote:
I would much rather have a pure Trie that is foldable.
If we have a Trie, we get a space efficient sorted list,
too.
  
   Well, Data.Sequence can be used as a space efficient
   sorted list which is Foldable - if you make the decision
   to insert elements into it in a sorted way, obviously.
  
   What advantages would a Trie have over Data.Sequence?
 
  A trie is guaranteed sorted -- so using a trie amounts to a
  type level guarantee for binary search and any other
  algorithm that relies on sortedness.

 ...No more so than a simple wrapper over a Data.Sequence which
 puts the elements in the right place.

  If by a wrapper you mean a function, well that's not type
  level at all. A binary search that insists on tries will, with
  a correct trie implementation, behave correctly on every
  input; a binary search that works with `Data.Sequence` will
  fail on a substantial portion of the inputs accepted by the
  type system, regardless of the data structure's correctness.

--
_jsn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Jason Dusek
  Well, sure, that could work too.

--
_jsn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-28 Thread Richard A. O'Keefe


On 28 Aug 2008, at 9:07 pm, Jules Bean wrote:
Insert for Data.Sequence is log(i) where i is the position of the  
insertion; clearly bounded by log(n). toList is O(n) and index is  
(at worst) log(i).


I think the corresponding operations with tries are log(n),


Let the key you want to insert have length L.
Then insertion into a trie is O(L), independent of the size of the  
collection.

If the alphabet the key's elements are drawn from is large,
you can use a Ternary Search Tree, and insertion is
then O(L.lg B) where B is the branching factor.
There are fast Trie construction algorithms which are linear
in the total size of the collection, \sum_{i=1}^{n}L_i.

The worst case for Data.Sequence is where keys mostly vary at
the end, in which case comparisons take O(L) time, and the
cost is O(L.lg n), where n is usually much bigger than B.


So, it's all in the constants, isn't it?


No.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello haskell-cafe,

solving one more task that uses English dictionary, i've thought: why we don't 
yet have pure hashtable library? There is imperative hashtables, pretty complex 
as they need to rebuild entire table as it grows. There is also simple assoc 
lists and tree/trie implementations, but there is no simple non-modifiable 
hashes.

how should it look:
* hashtable is represented as an array of assoc lists: Array Int [(a,b)]

* interface to extract data from ht is the same as from assoc list:
lookup :: HT a b - a - Maybe b

* ht may be built from assoc list. we should just know it's size beforehand in 
order to create Array of reasonable size. constructor also need a hashing 
function:

create :: [(a,b)] - Int - (a-Int) - HT a b


given these constraints, it should be just a 10-20 lines of code, and provide 
much better efficiency than any tree/trie implementations


-- 
Best regards,
 Bulat  mailto:[EMAIL PROTECTED]___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jason Dusek
Bulat Ziganshin [EMAIL PROTECTED]:
 given these constraints, it should be just a 10-20 lines of
 code, and provide much better efficiency than any tree/trie
 implementations

  Much better efficiency in what way?

--
_jsn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello Jason,

Wednesday, August 27, 2008, 11:55:31 AM, you wrote:

 given these constraints, it should be just a 10-20 lines of
 code, and provide much better efficiency than any tree/trie
 implementations

   Much better efficiency in what way?

instead of going through many levels of tree/trie, lookup function will just 
select array element by hash value and look through a few elements in assoc 
list:

data HT a b = HT (a-Int)   -- hash function
 (Array Int [(a,b)])

HT.lookup (HT hash arr) a   =   List.lookup (arr!hash a) a



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Thomas Davie


On 27 Aug 2008, at 10:09, Bulat Ziganshin wrote:


Hello Jason,

Wednesday, August 27, 2008, 11:55:31 AM, you wrote:

 given these constraints, it should be just a 10-20 lines of
 code, and provide much better efficiency than any tree/trie
 implementations

   Much better efficiency in what way?

instead of going through many levels of tree/trie, lookup function  
will just select array element by hash value and look through a few  
elements in assoc list:


data HT a b = HT (a-Int)   -- hash function
 (Array Int [(a,b)])

HT.lookup (HT hash arr) a   =   List.lookup (arr!hash a) a
Which makes two assumptions.  One is that your array is big enough  
(believable), and the other, that your font is big enough.


Bob

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bayley, Alistair
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Davie
 
  Much better efficiency in what way?
 
   instead of going through many levels of tree/trie, 
 lookup function will just select array element by hash value 
 and look through a few elements in assoc list:
 
   data HT a b = HT (a-Int)   -- hash function
(Array Int [(a,b)])
 
   HT.lookup (HT hash arr) a   =   List.lookup (arr!hash a) a
 
 Which makes two assumptions.  One is that your array is big 
 enough (believable), and the other, that your font is big enough.


 ... and the other, that your font is big enough.

Que? This is lost on me. Care to explain?

Alistair
*
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Thomas Davie


On 27 Aug 2008, at 10:39, Bayley, Alistair wrote:


From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Thomas Davie

   Much better efficiency in what way?

instead of going through many levels of tree/trie,
lookup function will just select array element by hash value
and look through a few elements in assoc list:

data HT a b = HT (a-Int)   -- hash function
 (Array Int [(a,b)])

HT.lookup (HT hash arr) a   =   List.lookup (arr!hash a) a

Which makes two assumptions.  One is that your array is big
enough (believable), and the other, that your font is big enough.




... and the other, that your font is big enough.


Que? This is lost on me. Care to explain?


Sorry, I probably should have sent that, it was a dig at the fact that  
the message was sent with all the text in font-size 930 or so.


Bob
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Stephan Friedrichs
Bulat Ziganshin wrote:
 solving one more task that uses English dictionary, i've thought: why we
 don't yet have pure hashtable library? There is imperative hashtables,
 [...]
 how should it look:
 
 * hashtable is represented as an array of assoc lists: Array Int [(a,b)]

Don't immutable arrays get rather inefficient when modified?

 
 [...]

Just a tought: You might want to have a look at the bloom filter
implementation. On one hand, it is an alternative for your dictionary
and on the other and, its implementation uses hash functions and arrays
as well. IIRC it does that in a state monad that keeps the array mutable
and finally freezes it before usage, which might be a good idea for pure
hash table as well.

//Stephan

-- 

Früher hieß es ja: Ich denke, also bin ich.
Heute weiß man: Es geht auch so.

 - Dieter Nuhr
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Johannes Waldmann

 * hashtable is represented as an array of assoc lists: Array Int [(a,b)]
 
 Don't immutable arrays get rather inefficient when modified?

Bulat was specifically asking for simple *non-modifiable* hashes
http://article.gmane.org/gmane.comp.lang.haskell.cafe/43612

J.W.



signature.asc
Description: OpenPGP digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jan-Willem Maessen

On Aug 27, 2008, at 3:41 AM, Bulat Ziganshin wrote:


Hello haskell-cafe,

solving one more task that uses English dictionary, i've thought:  
why we don't yet have pure hashtable library? There is imperative  
hashtables, pretty complex as they need to rebuild entire table as  
it grows. There is also simple assoc lists and tree/trie  
implementations, but there is no simple non-modifiable hashes.


I know that Lennart had such a hashtable implementation as part of the  
hbcc source tree (so dating back to the late stream age or the very  
very early monad age), though I think it relied upon hbc's LazyArray.


One obvious way to make non-modifiable hash tables useful is to eat  
your own tail non-strictly--- aggregate a set of hash table entries,  
construct a hash table from them, and plumb the resulting hash table  
into the original computation by tying the knot.  This works really  
well if you can construct the bucket lists lazily and if you specify  
the table size up front.  You can't make this trick work at all for  
tree-based maps in general, because the structure of the tree depends  
upon all the keys inserted.  You also can't make this trick work if  
you base the size of the hash table on the number of keys inserted,  
maximum bucket load, etc.  Finally, it doesn't work with strict arrays  
at all.


So a nice niche for a small and clever static hash table.

-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jules Bean

Bulat Ziganshin wrote:

Hello haskell-cafe,

solving one more task that uses English dictionary, i've thought: why we don't 
yet have pure hashtable library? There is imperative hashtables, pretty complex 
as they need to rebuild entire table as it grows. There is also simple assoc 
lists and tree/trie implementations, but there is no simple non-modifiable 
hashes.

how should it look:
* hashtable is represented as an array of assoc lists: Array Int [(a,b)]

* interface to extract data from ht is the same as from assoc list:
lookup :: HT a b - a - Maybe b

* ht may be built from assoc list. we should just know it's size beforehand in 
order to create Array of reasonable size. constructor also need a hashing 
function:

create :: [(a,b)] - Int - (a-Int) - HT a b


given these constraints, it should be just a 10-20 lines of code, and provide 
much better efficiency than any tree/trie implementations


Prove it.

To get much better efficient than a trie, the hash function has to be 
so fast that it is faster than following (log n) pointers, and yet also 
so perfect that it doesn't generate too many collisions.


As you correctly say, a simple implementation is easy to do, so why not 
do it and see how it performs? :)


Jules

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello Jules,

Wednesday, August 27, 2008, 7:21:46 PM, you wrote:

 given these constraints, it should be just a 10-20 lines of code, and 
 provide much better efficiency than any tree/trie implementations

 Prove it.

 To get much better efficient than a trie, the hash function has to be
 so fast that it is faster than following (log n) pointers

afaiu, trie search follows n pointers - i.e. every char in string
forms a new trie level. add to this that every search need to scan a
list of all possible chars - or you need to use the same hashing. so,
we have the following lookup times:

trie: O(len)*O(width)
hashed trie: O(len)
hash: O(len)

where len is length of string searched and width is average amount of
chars at each trie level



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jed Brown
On Wed 2008-08-27 16:21, Jules Bean wrote:
 Bulat Ziganshin wrote:
 Hello haskell-cafe,

 solving one more task that uses English dictionary, i've thought: why we 
 don't yet have pure hashtable library? There is imperative hashtables, 
 pretty complex as they need to rebuild entire table as it grows. There is 
 also simple assoc lists and tree/trie implementations, but there is no 
 simple non-modifiable hashes.

 how should it look:
 * hashtable is represented as an array of assoc lists: Array Int [(a,b)]

 * interface to extract data from ht is the same as from assoc list:
 lookup :: HT a b - a - Maybe b

 * ht may be built from assoc list. we should just know it's size beforehand 
 in order to create Array of reasonable size. constructor also need a hashing 
 function:

 create :: [(a,b)] - Int - (a-Int) - HT a b


 given these constraints, it should be just a 10-20 lines of code, and 
 provide much better efficiency than any tree/trie implementations

 Prove it.

 To get much better efficient than a trie, the hash function has to be  
 so fast that it is faster than following (log n) pointers, and yet also  
 so perfect that it doesn't generate too many collisions.

Many people have probably seen this and it has nothing to do with
Haskell, but it is a good performance comparison of a simple hash to an
optimized trie.

  http://www.nothings.org/computer/judy/

The conclusion (we're only interested in lookup times) is that the trie
is preferable for sequential lookups, slower for random access hits, and
about the same for random access misses.  Also, the hash was uniformly
better for small sizes ( 10k).

BTW a single cache miss has a latency around 250 cycles, you can compute
a hell of a hash for that.

Jed


pgpBGCqLEkPII.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Daniel Fischer
Am Mittwoch, 27. August 2008 17:36 schrieb Bulat Ziganshin:
 Hello Jules,

 Wednesday, August 27, 2008, 7:21:46 PM, you wrote:
  given these constraints, it should be just a 10-20 lines of code, and
  provide much better efficiency than any tree/trie implementations
 
  Prove it.
 
  To get much better efficient than a trie, the hash function has to be
  so fast that it is faster than following (log n) pointers

 afaiu, trie search follows n pointers - i.e. every char in string
 forms a new trie level. add to this that every search need to scan a
 list of all possible chars - or you need to use the same hashing. so,
 we have the following lookup times:

 trie: O(len)*O(width)
 hashed trie: O(len)
 hash: O(len)

Wouldn't the hashtable have lookup time O(len+bucketsize)?


 where len is length of string searched and width is average amount of
 chars at each trie level

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jules Bean

Bulat Ziganshin wrote:

Hello Jules,

Wednesday, August 27, 2008, 7:21:46 PM, you wrote:


given these constraints, it should be just a 10-20 lines of code, and provide 
much better efficiency than any tree/trie implementations



Prove it.



To get much better efficient than a trie, the hash function has to be
so fast that it is faster than following (log n) pointers


afaiu, trie search follows n pointers 


No.

n is the number of strings in my data set (dictionary).

If I have n strings the average string length is asymptotically, in 
some sense, log n. Of course for particular data sets it's may be more .


But log n is the length of the shortest unique coding, it's also the 
number of characters you typically need to traverse before you have 
reached a unique prefix, at which point your trie can short-circuit.


I appreciate that I didn't define my terminology ;) You might have a 
different n.


I repeat my challenge Prove it. I will be interested to see how much a 
good immutable hash outperforms Data.Map.


I would then also be interested to see how much it outperforms a decent 
Data.Map (such as your own AVL one) and a decent trie.


Jules

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello Daniel,

Wednesday, August 27, 2008, 8:01:24 PM, you wrote:

 trie: O(len)*O(width)
 hashed trie: O(len)
 hash: O(len)

 Wouldn't the hashtable have lookup time O(len+bucketsize)?

i suppose that bucketsize=O(1): constructor should get [approximate]
size of hashed assoclist and it will select appropriate Array size



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello Stephan,

Wednesday, August 27, 2008, 1:52:23 PM, you wrote:
 and on the other and, its implementation uses hash functions and arrays
 as well. IIRC it does that in a state monad that keeps the array mutable
 and finally freezes it before usage, which might be a good idea for pure
 hash table as well.

actually, any building of immutable arrays internally done in this
way. we just don't need to write this low-level function ourselves as
Array library provides us pure function to build array from
list/assoclist


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello Jan-Willem,

Wednesday, August 27, 2008, 4:06:11 PM, you wrote:

 One obvious way to make non-modifiable hash tables useful is to eat
 your own tail non-strictly--- aggregate a set of hash table entries,
 construct a hash table from them, and plumb the resulting hash table  
 into the original computation by tying the knot.  This works really  
 well if you can construct the bucket lists lazily and if you specify  
 the table size up front.

i think it's impossible since you need to scan whole assoclist to
build list of entries for any value of hash function. actually, the
same is true for any array building code - while the *values* of array
elements may be calculated lazy, *positions* cannot



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Bulat Ziganshin
Hello Jules,

Wednesday, August 27, 2008, 7:59:55 PM, you wrote:

 To get much better efficient than a trie, the hash function has to be
 so fast that it is faster than following (log n) pointers
 
 afaiu, trie search follows n pointers 

 No.

 n is the number of strings in my data set (dictionary).

 If I have n strings the average string length is asymptotically, in 
 some sense, log n. Of course for particular data sets it's may be more .

 But log n is the length of the shortest unique coding, it's also the
 number of characters you typically need to traverse before you have 
 reached a unique prefix, at which point your trie can short-circuit.

 I appreciate that I didn't define my terminology ;) You might have a 
 different n.

 I repeat my challenge Prove it. I will be interested to see how much a
 good immutable hash outperforms Data.Map.

 I would then also be interested to see how much it outperforms a decent
 Data.Map (such as your own AVL one) and a decent trie.

1) i don't have time/interest to do it, so i just pushed the idea

2) what you have proved there is that for set of n randomly chosen
strings trie lookup need O(log n) time - the same is true for hash

3) practical performance of trie will suffer by the need to follow
several trie levels each providing several elements to check. OTOH
hash lookup will usually need only 1-2 checks, including one check on
full string size

4) using ByteString for hash indexes would make lookups much faster

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Richard A. O'Keefe

Someone wrote:


trie: O(len)*O(width)
hashed trie: O(len)
hash: O(len)


If width here refers to the branching factor of the trie,
it's actually O(len.lg(width)), and the width that matters
is not the *possible* number of choices but the *actual*
number.

The great problem with hash tables is devising good hash
functions.  There is a vast literature about hash tables,
but there is very little about how to design good hash
functions for anything other than numbers and maybe strings.
It would be nice to think that _had__ there been plenty of
good advice about constructing good hash functions the Java
library implementors would have taken it.  As it is, their
hashing functions for collections leave much to be desired.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jason Dusek
  I would much rather have a pure Trie that is foldable. If we
  have a Trie, we get a space efficient sorted list, too.

--
_jsn
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Pure hashtable library

2008-08-27 Thread Jules Bean

Jason Dusek wrote:

  I would much rather have a pure Trie that is foldable. If we
  have a Trie, we get a space efficient sorted list, too.



Well, Data.Sequence can be used as a space efficient sorted list which 
is Foldable - if you make the decision to insert elements into it in a 
sorted way, obviously.


It's a fingertree not a trie, of course.

What advantages would a Trie have over Data.Sequence?

Jules
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe