RE: [Haskell-cafe] Re: Hashtable woes

2006-02-22 Thread Simon Marlow
On 21 February 2006 17:21, Chris Kuklewicz wrote:

 From the shooutout itself:
 

http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelan
g=ghcid=3
 
 and
 

http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelan
g=ghcid=2
 
 (I forget the exact different between them)
 
 From the wiki (the Current Entry):
 

http://haskell.org/hawiki/KnucleotideEntry#head-dfcdad61d34153143175bb9f
8237d87fe0813092

Thanks... sorry for being a bit dim, but how do I make this test run for
longer?  I downloaded the example input.  The prog doesn't seem to take
an argument, although the shootout suggests it should be given a
parameter of 25.

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-22 Thread Sebastian Sylvan
On 2/22/06, Simon Marlow [EMAIL PROTECTED] wrote:
 On 21 February 2006 17:21, Chris Kuklewicz wrote:

  From the shooutout itself:
 
 
 http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelan
 g=ghcid=3
 
  and
 
 
 http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelan
 g=ghcid=2
 
  (I forget the exact different between them)
 
  From the wiki (the Current Entry):
 
 
 http://haskell.org/hawiki/KnucleotideEntry#head-dfcdad61d34153143175bb9f
 8237d87fe0813092

 Thanks... sorry for being a bit dim, but how do I make this test run for
 longer?  I downloaded the example input.  The prog doesn't seem to take
 an argument, although the shootout suggests it should be given a
 parameter of 25.


I think you need to run the Fasta benchmark with N=25 to generate
the input file for this benchmark...

/S
--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-22 Thread Ketil Malde
Sebastian Sylvan [EMAIL PROTECTED] writes:

 I think you need to run the Fasta benchmark with N=25 to
 generate the input file for this benchmark...

I made the file available at http://www.ii.uib.no/~ketil/knuc.input

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Hashtable woes

2006-02-21 Thread Simon Marlow

Brian Sniffen wrote:

On 2/10/06, Ketil Malde [EMAIL PROTECTED] wrote:



Hmm...perhaps it is worth it, then?  The benchmark may specify hash
table, but I think it is fair to interpret it as associative data
structure - after all, people are using associative arrays that
(presumably) don't guarantee a hash table underneath, and it can be
argued that Data.Map is the canonical way to achieve that in Haskell.



Based on this advice, I wrote a k-nucleotide entry using the rough
structure of the OCaml entry, but with the manual IO from Chris and
Don's Haskell #2 entry.  It runs in under 4 seconds on my machine,
more than ten times the speed of the fastest Data.HashTable entry.


I haven't been following this too closely, but could someone provide me 
with (or point me to) the badly performing Data.HashTable example, so we 
can measure our improvements?


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-21 Thread Chris Kuklewicz
Simon Marlow wrote:
 Brian Sniffen wrote:
 On 2/10/06, Ketil Malde [EMAIL PROTECTED] wrote:


 Hmm...perhaps it is worth it, then?  The benchmark may specify hash
 table, but I think it is fair to interpret it as associative data
 structure - after all, people are using associative arrays that
 (presumably) don't guarantee a hash table underneath, and it can be
 argued that Data.Map is the canonical way to achieve that in Haskell.


 Based on this advice, I wrote a k-nucleotide entry using the rough
 structure of the OCaml entry, but with the manual IO from Chris and
 Don's Haskell #2 entry.  It runs in under 4 seconds on my machine,
 more than ten times the speed of the fastest Data.HashTable entry.
 
 I haven't been following this too closely, but could someone provide me
 with (or point me to) the badly performing Data.HashTable example, so we
 can measure our improvements?
 
 Cheers,
 Simon

From the shooutout itself:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelang=ghcid=3

and

http://shootout.alioth.debian.org/gp4/benchmark.php?test=knucleotidelang=ghcid=2

(I forget the exact different between them)

From the wiki (the Current Entry):

http://haskell.org/hawiki/KnucleotideEntry#head-dfcdad61d34153143175bb9f8237d87fe0813092
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-15 Thread John Meacham
On Wed, Feb 15, 2006 at 09:42:10AM +0100, Ketil Malde wrote:
 
 Not sure how relevant this is, but I see there is a recently released
 hash library here that might be a candidate for FFIing?
 
 https://sourceforge.net/projects/goog-sparsehash/
 
 | An extremely memory-efficient hash_map implementation. 2 bits/entry
 | overhead! The SparseHash library contains several hash-map
 | implementations, including implementations that optimize for space
 | or speed.

If we want really fast maps, we should be using this. it beats the
competition by far:

 http://judy.sourceforge.net/

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-15 Thread Jan-Willem Maessen


On Feb 15, 2006, at 3:42 AM, Ketil Malde wrote:



Not sure how relevant this is, but I see there is a recently released
hash library here that might be a candidate for FFIing?

https://sourceforge.net/projects/goog-sparsehash/


The real issue isn't the algorithms involved; I saw the best  
performance from the stupidest hash algorithm (well, and switching to  
multiplicative hashing rather than mod-k).  The problem is GC of hash  
table elements.  FFI-ing this library would give us really good  
algorithms, but the GC would all indirect through the FFI and I'd  
expect that to make things *worse*, not better.


-Jan



| An extremely memory-efficient hash_map implementation. 2 bits/entry
| overhead! The SparseHash library contains several hash-map
| implementations, including implementations that optimize for space
| or speed.

-k
--
If I haven't seen further, it is by standing in the footprints of  
giants


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-10 Thread Chris Kuklewicz
Ketil Malde wrote:
 Chris Kuklewicz [EMAIL PROTECTED] writes:
 
 Is Jan-Willem Maessen's Hash available anywhere?  I could benchmark it.
 
 Did you ever get around to run the benchmark?  I browsed around a bit,
 and found that the knucleotide is probably the worst GHC benchmark in
 the shootout (even TCL beats GHC by a factor of two!) - which is
 disheartening, because I rely a lot on associative data structures
 (usually Data.Map) in my programs.
 
 Or have Adrian Hey's AVL-trees been tried?
 
 -k

No, I did not try it.  This message from Simon Marlow

 Jan-Willem's HashTable attached.  It uses unsafeThaw/unsafeFreeze tricks
 to avoid the GC overheads, for this you need an up to date GHC due to a
 bug in the garbage collector: grab a STABLE snapshot (6.4.1 won't work).
 Or remove the unsafeThaw/unsafeFreeze to use it with 6.4.1, and be
 prepared to bump the heap size.
 
 In GHC 6.6 the unsafeThaw/unsafeFreeze tricks aren't required, because
 the GC is essentially doing it for you - we put a write barrier in the
 IOArray implementation.

indicates that it triggers a bug in 6.4.1, which is what the shootout is using.
And I suspected bumping the heap size just won't cut it for the amount of data
we are processing.

But I did not test that suspicion.

We never pounded on Data.Map, but I suspect it cannot be as bad as 
Data.Hashtable.

-- 
Chris
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-10 Thread Ketil Malde

 indicates that it triggers a bug in 6.4.1

Ah, I missed that.

For my word counting indexes, I've settled on Data.Map, calculating an
Int or Integer hash for each word (depending on word length, which is
fixed).  I haven't given it nearly the effort the shootout programs
have seen, though, so I'm not sure how optimal it is.

Other experiences with FiniteMap/Data.Map etc seem to indicate that
they are in the same ball park as Python's hashes.

 We never pounded on Data.Map, but I suspect it cannot be as bad as
 Data.Hashtable. 

Hmm...perhaps it is worth it, then?  The benchmark may specify hash
table, but I think it is fair to interpret it as associative data
structure - after all, people are using associative arrays that
(presumably) don't guarantee a hash table underneath, and it can be
argued that Data.Map is the canonical way to achieve that in Haskell. 

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-02-10 Thread Brian Sniffen
On 2/10/06, Ketil Malde [EMAIL PROTECTED] wrote:

 Hmm...perhaps it is worth it, then?  The benchmark may specify hash
 table, but I think it is fair to interpret it as associative data
 structure - after all, people are using associative arrays that
 (presumably) don't guarantee a hash table underneath, and it can be
 argued that Data.Map is the canonical way to achieve that in Haskell.

Based on this advice, I wrote a k-nucleotide entry using the rough
structure of the OCaml entry, but with the manual IO from Chris and
Don's Haskell #2 entry.  It runs in under 4 seconds on my machine,
more than ten times the speed of the fastest Data.HashTable entry.

--
Brian T. Sniffen
[EMAIL PROTECTED]or[EMAIL PROTECTED]
http://www.evenmere.org/~bts
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Hashtable woes

2006-01-23 Thread Simon Marlow

Bulat Ziganshin wrote:

Hello Chris,

Monday, January 23, 2006, 12:27:53 PM, you wrote:

CK The only mutable data structure that come with GHC besides arrays is
CK Data.Hashtable, which is not comptetitive with OCaml Hashtbl or DMD's
CK associative arrays (unless there is something great hidden under
CK Data.Graph). Is there any hope for GHC 6.6? Does anyone have pointers to
CK an existing library at all? Perl and Python and Lua also have excellent
CK built in hashtable capabilities. Where is a good library for Haskell?

1) are you used +RTS -A10m / +RTS -H100m?

2) Simon Marlow optimized something in the IOArray handling, but i
don't understand that is changed. see http://cvs.haskell.org/trac/ghc/ticket/650


Much of the GC overhead of Data.Hash should be gone in GHC 6.6.  I also 
have an improved implementation of Data.Hash from Jan-Willem Maessen to 
import, but that will be 6.6 rather than 6.4.2.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Hashtable woes

2006-01-23 Thread Chris Kuklewicz
Simon Marlow wrote:
 Bulat Ziganshin wrote:
 Hello Chris,

 Monday, January 23, 2006, 12:27:53 PM, you wrote:

 CK The only mutable data structure that come with GHC besides arrays is
 CK Data.Hashtable, which is not comptetitive with OCaml Hashtbl or DMD's
 CK associative arrays (unless there is something great hidden under
 CK Data.Graph). Is there any hope for GHC 6.6? Does anyone have
 pointers to
 CK an existing library at all? Perl and Python and Lua also have
 excellent
 CK built in hashtable capabilities. Where is a good library for Haskell?

 1) are you used +RTS -A10m / +RTS -H100m?

 2) Simon Marlow optimized something in the IOArray handling, but i
 don't understand that is changed. see
 http://cvs.haskell.org/trac/ghc/ticket/650
 
 Much of the GC overhead of Data.Hash should be gone in GHC 6.6.  I also
 have an improved implementation of Data.Hash from Jan-Willem Maessen to
 import, but that will be 6.6 rather than 6.4.2.
 
 Cheers,
 Simon

That is good to hear.  The benchmark's tests take 1,250,000 pre-generated
strings as the keys.  At the end, the string keys are 18 characters long, drawn
randomly from a set of 4 characters.  So the hash computations are a nontrivial 
hit.

Using -A400m I get 39s down from 55s.  That is the best Data.HashTable time I
have seen. (Using -A10m and -A100m were a little slower).

Using my over optimized c-code hashtable I get 12.3 seconds.  The associative
arrays in OCaml and D are still faster.  So you see why I long for GHC 6.6.

Is Jan-Willem Maessen's Hash available anywhere?  I could benchmark it.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Re: Hashtable woes

2006-01-23 Thread Bulat Ziganshin
Hello Chris,

Monday, January 23, 2006, 6:09:15 PM, you wrote:

CK Using -A400m I get 39s down from 55s.  That is the best Data.HashTable time 
I
CK have seen. (Using -A10m and -A100m were a little slower).

1) -A400m is a bit unusual. -H400m for 500-meg machine, -H800m
for 1g computer and so on will be fastest. current GHC doc leaks
explanations in this area, but basically -H just allocates that much
area and then dynamically changes -A after each GC allocating all
available space to the generation-0 memory pool

2) it's better to say that was MUT and GC times in your program, and
even better just to post its output with +RTS -sstderr

please post improved results here. that's really interesting for me,
and for my programs too ;)

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe